How background affects image classification in popular pretrained models

Photo by Karina B. on Unsplash

How background affects image classification in popular pretrained models

The paper of interest this time is Noise or Signal: The Role of Image Backgrounds in Object Recognition, all credits to the authors for their amazing work. The authors investigate effect of background on the efficiency of object detection models and in turn find interesting relationship between background and foreground of an image. Interestingly enough they find models to be surpringly dependant on background while classifying objects and can show considerably good predictions by relying on background alone! More alarming yet, models tend to missclassify correctly labeled foreground images if not coupled with the known background. The third observation being that more accurate models tend to depend less on background. The authors curated a dataset called Imagenet-9 where they separate background from foreground and then evaluate the performance of state-of-the-art models. The dataset is further divided into seven parts differeing in how they process the background and foreground.

Pasted image 20220923232125.png Image source: arxiv.org/abs/2006.09994

Salient findings from the experiment:

  1. Models can make apt classification based on background alone
  2. Models take into account background signal to make decision
  3. Models perform poorly when presented with the same data but with adversarial background

Some measures to take to mitigate such problems

  1. Train models with images where background is decoupled from labels
  2. perform specific image augmentation techniques to reduce such correlation
  3. use training algorithms like distributionally robust optimization [Sag+20] and model-based robust learning [RHP20]

My takeaways

Machine learning models don't necessarily work they way we want to even if they seem to produce desired outputs. The fact that images utilize background and foreground may or may not be a good thing as humans also use contextual information to recognise objects. However humans are also good at recognising those objects when they are detached from the familiar context as well. Furthermore strongly coupling background to the object may be undesirabIe in many practical scenarios for ex we would want a cat classifier to classifier to classify it a cat regardless of whether it's at home or out in deep wilderness. I think besides uncovering how popular models work while classifying images this study also encourages us to think more about how ML models learn contextual information and use that to understand the world. An interesting side study would be to understand how ML models to correlate context and the object of interest to build more robust models.