Visually guided spatial relation extraction from text

Taher Rahgooy, Umar Manzoor, Parisa Kordjamshidi

Research output: Chapter in Book/Published conference outputConference publication

Abstract

Extraction of spatial relations from sentences with complex/nesting relationships is very challenging as often needs resolving inherent semantic ambiguities. We seek help from visual modality to fill the information gap in the text modality and resolve spatial semantic ambiguities. We use various recent vision and language datasets and techniques to train inter-modality alignment models, visual relationship classifiers and propose a novel global inference model to integrate these components into our structured output prediction model for spatial role and relation extraction. Our global inference model enables us to utilize the visual and geometric relationships between objects and improves the state-of-art results of spatial information extraction from text.

Original languageEnglish
Title of host publicationProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2, Short Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages788-794
Number of pages7
ISBN (Electronic)9781948087292
DOIs
Publication statusPublished - Jun 2018
Event2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018 - New Orleans, United States
Duration: 1 Jun 20186 Jun 2018

Publication series

NameNAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
Volume2

Conference

Conference2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018
Country/TerritoryUnited States
CityNew Orleans
Period1/06/186/06/18

Fingerprint

Dive into the research topics of 'Visually guided spatial relation extraction from text'. Together they form a unique fingerprint.

Cite this