The human visual system and CNNs can both support robust online translation tolerance following extreme displacements

Ryan Blything; Valerio Biscione; Ivan Vankov; Casimir J.H. Ludwig; Jeffrey Bowers

doi:10.1167/jov.21.2.9

The human visual system and CNNs can both support robust online translation tolerance following extreme displacements

Ryan Blything, Valerio Biscione, Ivan Vankov, Casimir J.H. Ludwig, Jeffrey Bowers

Research output: Contribution to journal › Article › peer-review

Abstract

Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10 degrees and other reporting zero invariance at 4 degrees of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here, we report a series of eye-tracking studies (total N = 70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18 degrees. We also show that standard deep convolutional neural networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a global average pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with convolutional neural networks (CNNs).

Original language	English
Article number	9
Pages (from-to)	1-16
Number of pages	16
Journal	Journal of Vision
Volume	21
Issue number	2
DOIs	https://doi.org/10.1167/jov.21.2.9
Publication status	Published - 23 Feb 2021

Bibliographical note

Keywords

convolutional neural networks
global average pooling (GAP)
object recognition
translation invariance
translation tolerance

Access to Document

10.1167/jov.21.2.9Licence: CC BY-NC-ND 3.0

The human visual system and CNNs
Copyright 2021 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Final published version, 3.69 MBLicence: CC BY-NC-ND 3.0

Cite this

@article{159af3c05a0a445f93a620039c03e0f0,

title = "The human visual system and CNNs can both support robust online translation tolerance following extreme displacements",

abstract = "Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10 degrees and other reporting zero invariance at 4 degrees of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here, we report a series of eye-tracking studies (total N = 70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18 degrees. We also show that standard deep convolutional neural networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a global average pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with convolutional neural networks (CNNs).",

keywords = "convolutional neural networks, global average pooling (GAP), object recognition, translation invariance, translation tolerance",

author = "Ryan Blything and Valerio Biscione and Ivan Vankov and Ludwig, {Casimir J.H.} and Jeffrey Bowers",

year = "2021",

month = feb,

day = "23",

doi = "10.1167/jov.21.2.9",

language = "English",

volume = "21",

pages = "1--16",

journal = "Journal of Vision",

issn = "1534-7362",

publisher = "Association for Research in Vision and Ophthalmology Inc.",

number = "2",

}

TY - JOUR

T1 - The human visual system and CNNs can both support robust online translation tolerance following extreme displacements

AU - Blything, Ryan

AU - Biscione, Valerio

AU - Vankov, Ivan

AU - Ludwig, Casimir J.H.

AU - Bowers, Jeffrey

PY - 2021/2/23

Y1 - 2021/2/23

N2 - Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10 degrees and other reporting zero invariance at 4 degrees of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here, we report a series of eye-tracking studies (total N = 70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18 degrees. We also show that standard deep convolutional neural networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a global average pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with convolutional neural networks (CNNs).

AB - Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10 degrees and other reporting zero invariance at 4 degrees of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here, we report a series of eye-tracking studies (total N = 70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18 degrees. We also show that standard deep convolutional neural networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a global average pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with convolutional neural networks (CNNs).

KW - convolutional neural networks

KW - global average pooling (GAP)

KW - object recognition

KW - translation invariance

KW - translation tolerance

UR - https://jov.arvojournals.org/article.aspx?articleid=2772320

UR - http://www.scopus.com/inward/record.url?scp=85101527154&partnerID=8YFLogxK

U2 - 10.1167/jov.21.2.9

DO - 10.1167/jov.21.2.9

M3 - Article

C2 - 33620380

SN - 1534-7362

VL - 21

SP - 1

EP - 16

JO - Journal of Vision

JF - Journal of Vision

IS - 2

M1 - 9

ER -

The human visual system and CNNs can both support robust online translation tolerance following extreme displacements

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this