Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval

Noa Garcia; George Vogiatzis

Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval

Noa Garcia , George Vogiatzis

Computer Science Research Group

Research output: Chapter in Book/Published conference output › Conference publication

Abstract

We address the problem of image-to-video retrieval. Given a query image, the aim
is to identify the frame or scene within a collection of videos that best matches the visual
input. Matching images to videos is an asymmetric task in which specific features
for capturing the visual information in images and, at the same time, compacting the
temporal correlation from videos are needed. Methods proposed so far are based on the
temporal aggregation of hand-crafted features. In this work, we propose a deep learning
architecture for learning specific asymmetric spatio-temporal embeddings for image-tovideo
retrieval. Our method learns non-linear projections from training data for both
images and videos and projects their visual content into a common latent space, where
they can be easily compared with a standard similarity function. Experiments conducted
here show that our proposed asymmetric spatio-temporal embeddings outperform stateof-the-art
in standard image-to-video retrieval datasets.

Original language	English
Title of host publication	29TH BRITISH MACHINE VISION CONFERENCE
Publication status	Published - 6 Sept 2018
Event	29TH BRITISH MACHINE VISION CONFERENCE - Newcastle, United Kingdom Duration: 3 Sept 2018 → 6 Sept 2018

Conference

Conference	29TH BRITISH MACHINE VISION CONFERENCE
Country/Territory	United Kingdom
City	Newcastle
Period	3/09/18 → 6/09/18

Bibliographical note

© 2018. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms

Access to Document

Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval
© 2018. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.
Final published version, 1.9 MB

Cite this

@inproceedings{39b9f52adb8b4324b8c67c65d9f89a78,

title = "Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval",

abstract = "We address the problem of image-to-video retrieval. Given a query image, the aimis to identify the frame or scene within a collection of videos that best matches the visualinput. Matching images to videos is an asymmetric task in which specific featuresfor capturing the visual information in images and, at the same time, compacting thetemporal correlation from videos are needed. Methods proposed so far are based on thetemporal aggregation of hand-crafted features. In this work, we propose a deep learningarchitecture for learning specific asymmetric spatio-temporal embeddings for image-tovideoretrieval. Our method learns non-linear projections from training data for bothimages and videos and projects their visual content into a common latent space, wherethey can be easily compared with a standard similarity function. Experiments conductedhere show that our proposed asymmetric spatio-temporal embeddings outperform stateof-the-artin standard image-to-video retrieval datasets.",

author = "Noa Garcia and George Vogiatzis",

note = "{\textcopyright} 2018. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms; 29TH BRITISH MACHINE VISION CONFERENCE ; Conference date: 03-09-2018 Through 06-09-2018",

year = "2018",

month = sep,

day = "6",

language = "English",

booktitle = "29TH BRITISH MACHINE VISION CONFERENCE",

}

TY - GEN

T1 - Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval

AU - Garcia , Noa

AU - Vogiatzis, George

PY - 2018/9/6

Y1 - 2018/9/6

N2 - We address the problem of image-to-video retrieval. Given a query image, the aimis to identify the frame or scene within a collection of videos that best matches the visualinput. Matching images to videos is an asymmetric task in which specific featuresfor capturing the visual information in images and, at the same time, compacting thetemporal correlation from videos are needed. Methods proposed so far are based on thetemporal aggregation of hand-crafted features. In this work, we propose a deep learningarchitecture for learning specific asymmetric spatio-temporal embeddings for image-tovideoretrieval. Our method learns non-linear projections from training data for bothimages and videos and projects their visual content into a common latent space, wherethey can be easily compared with a standard similarity function. Experiments conductedhere show that our proposed asymmetric spatio-temporal embeddings outperform stateof-the-artin standard image-to-video retrieval datasets.

AB - We address the problem of image-to-video retrieval. Given a query image, the aimis to identify the frame or scene within a collection of videos that best matches the visualinput. Matching images to videos is an asymmetric task in which specific featuresfor capturing the visual information in images and, at the same time, compacting thetemporal correlation from videos are needed. Methods proposed so far are based on thetemporal aggregation of hand-crafted features. In this work, we propose a deep learningarchitecture for learning specific asymmetric spatio-temporal embeddings for image-tovideoretrieval. Our method learns non-linear projections from training data for bothimages and videos and projects their visual content into a common latent space, wherethey can be easily compared with a standard similarity function. Experiments conductedhere show that our proposed asymmetric spatio-temporal embeddings outperform stateof-the-artin standard image-to-video retrieval datasets.

UR - http://bmvc2018.org/

M3 - Conference publication

BT - 29TH BRITISH MACHINE VISION CONFERENCE

T2 - 29TH BRITISH MACHINE VISION CONFERENCE

Y2 - 3 September 2018 through 6 September 2018

ER -

Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval

Abstract

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this