Spatial and temporal representations for multi-modal visual retrieval

  • Noa Garcia Docampo

Student thesis: Doctoral ThesisDoctor of Philosophy


This dissertation studies the problem of finding relevant content within a
visual collection according to a specific query by addressing three key modalities:
symmetric visual retrieval, asymmetric visual retrieval and cross-modal retrieval,
depending on the kind of data to be processed.
In symmetric visual retrieval, the query object and the elements in the collection
are from the same kind of visual data, i.e. images or videos. Inspired by the
human visual perception system, we propose new techniques to estimate visual
similarity in image-to-image retrieval datasets based on non-metric functions,
improving image retrieval performance on top of state-of-the-art methods.
On the other hand, asymmetric visual retrieval is the problem in which queries
and elements in the dataset are from different types of visual data. We propose
methods to aggregate the temporal information of video segments so that imagevideo
comparisons can be computed using similarity functions. When compared
in image-to-video retrieval datasets, our algorithms drastically reduce memory
storage while maintaining high accuracy rates.
Finally, we introduce new solutions for cross-modal retrieval, which is the task
in which either the queries or the elements in the collection are non-visual objects.
In particular, we study text-image retrieval in the domain of art by introducing
new models for semantic art understanding, obtaining results close to human
Overall, this thesis advances the state-of-the-art in visual retrieval by presenting
novel solutions for some of the key tasks in the field. The contributions
derived from this work have potential direct applications in the era of big data,
as visual datasets are growing exponentially every day and new techniques for
storing, accessing and managing large-scale visual collections are required.
Date of Award25 Mar 2019
Original languageEnglish
SupervisorGeorge Vogiatzis (Supervisor)


  • image retrieval
  • video retrieval
  • cross-modal retrieval

Cite this