An Efficient Approach for Geo-Multimedia Cross-Modal Retrieval

Lei Zhu, Jun Long, Chengyuan Zhang, Weiren Yu, Xinpan Yuan, Longzhi Sun

Research output: Contribution to journalArticle

Abstract

Due to the rapid development of mobile Internet techniques, such as online social networking and location-based services, massive amount of multimedia data with geographical information is generated and uploaded to the Internet. In this paper, we propose a novel type of cross-modal multimedia retrieval, called geo-multimedia cross-modal retrieval, which aims to find a set of geo-multimedia objects according to geographical distance proximity and semantic concept similarity. Previous studies for cross-modal retrieval and spatial keyword search cannot address this problem effectively because they do not consider multimedia data with geo-tags (geo-multimedia). Firstly, we present the definition of k NN geo-multimedia cross-modal query and introduce relevant concepts such as spatial distance and semantic similarity measurement. As the key notion of this work, cross-modal semantic representation space is formulated at the first time. A novel framework for geo-multimedia cross-modal retrieval is proposed, which includes multi-modal feature extraction, cross-modal semantic space mapping, geo-multimedia spatial index and cross-modal semantic similarity measurement. To bridge the semantic gap between different modalities, we also propose a method named cross-modal semantic matching (CoSMat for shot) which contains two important components, i.e., CorrProj and LogsTran, which aims to build a common semantic representation space for cross-modal semantic similarity measurement. In addition, to implement semantic similarity measurement, we employ deep learning based method to learn multi-modal features that contains more high level semantic information. Moreover, a novel hybrid index, GMR-Tree is carefully designed, which combines signatures of semantic representations and R-Tree. An efficient GMR-Tree based k NN search algorithm called k GMCMS is developed. Comprehensive experimental evaluations on real and synthetic datasets clearly demonstrate that our approach outperforms the-state-of-the-art methods.

Original languageEnglish
Article number8827517
Pages (from-to)180571-180589
Number of pages19
JournalIEEE Access
Volume7
Early online date9 Sep 2019
DOIs
Publication statusE-pub ahead of print - 9 Sep 2019

Fingerprint

Semantics
Internet
Location based services
Feature extraction

Bibliographical note

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

Keywords

  • Cross-modal retrieval
  • deep learning
  • geo-multimedia
  • kNN spatial search

Cite this

Zhu, L., Long, J., Zhang, C., Yu, W., Yuan, X., & Sun, L. (2019). An Efficient Approach for Geo-Multimedia Cross-Modal Retrieval. IEEE Access, 7, 180571-180589. [8827517]. https://doi.org/10.1109/ACCESS.2019.2940055
Zhu, Lei ; Long, Jun ; Zhang, Chengyuan ; Yu, Weiren ; Yuan, Xinpan ; Sun, Longzhi. / An Efficient Approach for Geo-Multimedia Cross-Modal Retrieval. In: IEEE Access. 2019 ; Vol. 7. pp. 180571-180589.
@article{c3c766055b1243279a6812069b57b894,
title = "An Efficient Approach for Geo-Multimedia Cross-Modal Retrieval",
abstract = "Due to the rapid development of mobile Internet techniques, such as online social networking and location-based services, massive amount of multimedia data with geographical information is generated and uploaded to the Internet. In this paper, we propose a novel type of cross-modal multimedia retrieval, called geo-multimedia cross-modal retrieval, which aims to find a set of geo-multimedia objects according to geographical distance proximity and semantic concept similarity. Previous studies for cross-modal retrieval and spatial keyword search cannot address this problem effectively because they do not consider multimedia data with geo-tags (geo-multimedia). Firstly, we present the definition of k NN geo-multimedia cross-modal query and introduce relevant concepts such as spatial distance and semantic similarity measurement. As the key notion of this work, cross-modal semantic representation space is formulated at the first time. A novel framework for geo-multimedia cross-modal retrieval is proposed, which includes multi-modal feature extraction, cross-modal semantic space mapping, geo-multimedia spatial index and cross-modal semantic similarity measurement. To bridge the semantic gap between different modalities, we also propose a method named cross-modal semantic matching (CoSMat for shot) which contains two important components, i.e., CorrProj and LogsTran, which aims to build a common semantic representation space for cross-modal semantic similarity measurement. In addition, to implement semantic similarity measurement, we employ deep learning based method to learn multi-modal features that contains more high level semantic information. Moreover, a novel hybrid index, GMR-Tree is carefully designed, which combines signatures of semantic representations and R-Tree. An efficient GMR-Tree based k NN search algorithm called k GMCMS is developed. Comprehensive experimental evaluations on real and synthetic datasets clearly demonstrate that our approach outperforms the-state-of-the-art methods.",
keywords = "Cross-modal retrieval, deep learning, geo-multimedia, kNN spatial search",
author = "Lei Zhu and Jun Long and Chengyuan Zhang and Weiren Yu and Xinpan Yuan and Longzhi Sun",
note = "This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/",
year = "2019",
month = "9",
day = "9",
doi = "10.1109/ACCESS.2019.2940055",
language = "English",
volume = "7",
pages = "180571--180589",

}

Zhu, L, Long, J, Zhang, C, Yu, W, Yuan, X & Sun, L 2019, 'An Efficient Approach for Geo-Multimedia Cross-Modal Retrieval', IEEE Access, vol. 7, 8827517, pp. 180571-180589. https://doi.org/10.1109/ACCESS.2019.2940055

An Efficient Approach for Geo-Multimedia Cross-Modal Retrieval. / Zhu, Lei; Long, Jun; Zhang, Chengyuan; Yu, Weiren; Yuan, Xinpan; Sun, Longzhi.

In: IEEE Access, Vol. 7, 8827517, 09.09.2019, p. 180571-180589.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An Efficient Approach for Geo-Multimedia Cross-Modal Retrieval

AU - Zhu, Lei

AU - Long, Jun

AU - Zhang, Chengyuan

AU - Yu, Weiren

AU - Yuan, Xinpan

AU - Sun, Longzhi

N1 - This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

PY - 2019/9/9

Y1 - 2019/9/9

N2 - Due to the rapid development of mobile Internet techniques, such as online social networking and location-based services, massive amount of multimedia data with geographical information is generated and uploaded to the Internet. In this paper, we propose a novel type of cross-modal multimedia retrieval, called geo-multimedia cross-modal retrieval, which aims to find a set of geo-multimedia objects according to geographical distance proximity and semantic concept similarity. Previous studies for cross-modal retrieval and spatial keyword search cannot address this problem effectively because they do not consider multimedia data with geo-tags (geo-multimedia). Firstly, we present the definition of k NN geo-multimedia cross-modal query and introduce relevant concepts such as spatial distance and semantic similarity measurement. As the key notion of this work, cross-modal semantic representation space is formulated at the first time. A novel framework for geo-multimedia cross-modal retrieval is proposed, which includes multi-modal feature extraction, cross-modal semantic space mapping, geo-multimedia spatial index and cross-modal semantic similarity measurement. To bridge the semantic gap between different modalities, we also propose a method named cross-modal semantic matching (CoSMat for shot) which contains two important components, i.e., CorrProj and LogsTran, which aims to build a common semantic representation space for cross-modal semantic similarity measurement. In addition, to implement semantic similarity measurement, we employ deep learning based method to learn multi-modal features that contains more high level semantic information. Moreover, a novel hybrid index, GMR-Tree is carefully designed, which combines signatures of semantic representations and R-Tree. An efficient GMR-Tree based k NN search algorithm called k GMCMS is developed. Comprehensive experimental evaluations on real and synthetic datasets clearly demonstrate that our approach outperforms the-state-of-the-art methods.

AB - Due to the rapid development of mobile Internet techniques, such as online social networking and location-based services, massive amount of multimedia data with geographical information is generated and uploaded to the Internet. In this paper, we propose a novel type of cross-modal multimedia retrieval, called geo-multimedia cross-modal retrieval, which aims to find a set of geo-multimedia objects according to geographical distance proximity and semantic concept similarity. Previous studies for cross-modal retrieval and spatial keyword search cannot address this problem effectively because they do not consider multimedia data with geo-tags (geo-multimedia). Firstly, we present the definition of k NN geo-multimedia cross-modal query and introduce relevant concepts such as spatial distance and semantic similarity measurement. As the key notion of this work, cross-modal semantic representation space is formulated at the first time. A novel framework for geo-multimedia cross-modal retrieval is proposed, which includes multi-modal feature extraction, cross-modal semantic space mapping, geo-multimedia spatial index and cross-modal semantic similarity measurement. To bridge the semantic gap between different modalities, we also propose a method named cross-modal semantic matching (CoSMat for shot) which contains two important components, i.e., CorrProj and LogsTran, which aims to build a common semantic representation space for cross-modal semantic similarity measurement. In addition, to implement semantic similarity measurement, we employ deep learning based method to learn multi-modal features that contains more high level semantic information. Moreover, a novel hybrid index, GMR-Tree is carefully designed, which combines signatures of semantic representations and R-Tree. An efficient GMR-Tree based k NN search algorithm called k GMCMS is developed. Comprehensive experimental evaluations on real and synthetic datasets clearly demonstrate that our approach outperforms the-state-of-the-art methods.

KW - Cross-modal retrieval

KW - deep learning

KW - geo-multimedia

KW - kNN spatial search

UR - https://ieeexplore.ieee.org/document/8827517/

UR - http://www.scopus.com/inward/record.url?scp=85077233631&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2019.2940055

DO - 10.1109/ACCESS.2019.2940055

M3 - Article

VL - 7

SP - 180571

EP - 180589

M1 - 8827517

ER -

Zhu L, Long J, Zhang C, Yu W, Yuan X, Sun L. An Efficient Approach for Geo-Multimedia Cross-Modal Retrieval. IEEE Access. 2019 Sep 9;7:180571-180589. 8827517. https://doi.org/10.1109/ACCESS.2019.2940055