Abstract
We address the problem of image-to-video retrieval. Given a query image, the aim
is to identify the frame or scene within a collection of videos that best matches the visual
input. Matching images to videos is an asymmetric task in which specific features
for capturing the visual information in images and, at the same time, compacting the
temporal correlation from videos are needed. Methods proposed so far are based on the
temporal aggregation of hand-crafted features. In this work, we propose a deep learning
architecture for learning specific asymmetric spatio-temporal embeddings for image-tovideo
retrieval. Our method learns non-linear projections from training data for both
images and videos and projects their visual content into a common latent space, where
they can be easily compared with a standard similarity function. Experiments conducted
here show that our proposed asymmetric spatio-temporal embeddings outperform stateof-the-art
in standard image-to-video retrieval datasets.
is to identify the frame or scene within a collection of videos that best matches the visual
input. Matching images to videos is an asymmetric task in which specific features
for capturing the visual information in images and, at the same time, compacting the
temporal correlation from videos are needed. Methods proposed so far are based on the
temporal aggregation of hand-crafted features. In this work, we propose a deep learning
architecture for learning specific asymmetric spatio-temporal embeddings for image-tovideo
retrieval. Our method learns non-linear projections from training data for both
images and videos and projects their visual content into a common latent space, where
they can be easily compared with a standard similarity function. Experiments conducted
here show that our proposed asymmetric spatio-temporal embeddings outperform stateof-the-art
in standard image-to-video retrieval datasets.
Original language | English |
---|---|
Title of host publication | 29TH BRITISH MACHINE VISION CONFERENCE |
Publication status | Published - 6 Sept 2018 |
Event | 29TH BRITISH MACHINE VISION CONFERENCE - Newcastle, United Kingdom Duration: 3 Sept 2018 → 6 Sept 2018 |
Conference
Conference | 29TH BRITISH MACHINE VISION CONFERENCE |
---|---|
Country/Territory | United Kingdom |
City | Newcastle |
Period | 3/09/18 → 6/09/18 |
Bibliographical note
© 2018. The copyright of this document resides with its authors.It may be distributed unchanged freely in print or electronic forms