Are there any ‘object detectors’ in the hidden layers of CNNs trained to identify objects or scenes?

Ella Gale; Nicholas Martin; Ryan Blything; Anh Nguyen; Jeffrey Bowers

doi:10.1016/j.visres.2020.06.007

Are there any ‘object detectors’ in the hidden layers of CNNs trained to identify objects or scenes?

Ella Gale, Nicholas Martin, Ryan Blything, Anh Nguyen, Jeffrey Bowers

School of Psychology

Research output: Contribution to journal › Article › peer-review

Abstract

Various methods of measuring unit selectivity have been developed with the aim of better understanding how neural networks work. But the different measures provide divergent estimates of selectivity, and this has led to different conclusions regarding the conditions in which selective object representations are learned and the functional relevance of these representations. In an attempt to better characterize object selectivity, we undertake a comparison of various selectivity measures on a large set of units in AlexNet, including localist selectivity [7], precision [42], class-conditional mean activity selectivity (CCMAS) [22], the human interpretation of activation maximization (AM) images, and standard signal-detection measures. We find that the different measures provide different estimates of object selectivity, with precision and CCMAS measures providing misleadingly high estimates. Indeed, the most selective units had a poor hit-rate or a high false-alarm rate (or both) in object classification, making them poor object detectors. We fail to find any units that are even remotely as selective as the ‘grandmother cell’ units reported in recurrent neural networks. In order to generalize these results, we compared selectivity measures on units in VGG-16 and GoogLeNet trained on the ImageNet or Places-365 datasets that have been described as ‘object detectors’ according to network dissection [43]. Again, we find poor hit-rates and high false-alarm rates for object classification. We conclude that signal-detection measures provide a better assessment of single-unit selectivity compared to common alternative approaches, and that deep convolutional networks of image classification do not learn object detectors in their hidden layers.

Original language	English
Pages (from-to)	60-71
Journal	Vision Research
Volume	176
Early online date	8 Aug 2020
DOIs	https://doi.org/10.1016/j.visres.2020.06.007
Publication status	Published - 1 Nov 2020

Access to Document

10.1016/j.visres.2020.06.007

https://research-information.bris.ac.uk/en/publications/are-there-any-object-detectors-in-the-hidden-layers-of-cnns-train

Cite this

@article{ab2579bc4cf54edf84e4b5c0b7f8aa59,

title = "Are there any {\textquoteleft}object detectors{\textquoteright} in the hidden layers of CNNs trained to identify objects or scenes?",

abstract = "Various methods of measuring unit selectivity have been developed with the aim of better understanding how neural networks work. But the different measures provide divergent estimates of selectivity, and this has led to different conclusions regarding the conditions in which selective object representations are learned and the functional relevance of these representations. In an attempt to better characterize object selectivity, we undertake a comparison of various selectivity measures on a large set of units in AlexNet, including localist selectivity [7], precision [42], class-conditional mean activity selectivity (CCMAS) [22], the human interpretation of activation maximization (AM) images, and standard signal-detection measures. We find that the different measures provide different estimates of object selectivity, with precision and CCMAS measures providing misleadingly high estimates. Indeed, the most selective units had a poor hit-rate or a high false-alarm rate (or both) in object classification, making them poor object detectors. We fail to find any units that are even remotely as selective as the {\textquoteleft}grandmother cell{\textquoteright} units reported in recurrent neural networks. In order to generalize these results, we compared selectivity measures on units in VGG-16 and GoogLeNet trained on the ImageNet or Places-365 datasets that have been described as {\textquoteleft}object detectors{\textquoteright} according to network dissection [43]. Again, we find poor hit-rates and high false-alarm rates for object classification. We conclude that signal-detection measures provide a better assessment of single-unit selectivity compared to common alternative approaches, and that deep convolutional networks of image classification do not learn object detectors in their hidden layers.",

author = "Ella Gale and Nicholas Martin and Ryan Blything and Anh Nguyen and Jeffrey Bowers",

year = "2020",

month = nov,

day = "1",

doi = "10.1016/j.visres.2020.06.007",

language = "English",

volume = "176",

pages = "60--71",

journal = "Vision Research",