How to read paintings:  Semantic art understanding with multi-modal retrieval

Noa Garcia; George Vogiatzis

doi:10.1007/978-3-030-11012-3_52

How to read paintings: Semantic art understanding with multi-modal retrieval

Noa Garcia^*, George Vogiatzis

^*Corresponding author for this work

Research output: Chapter in Book/Published conference output › Conference publication

Abstract

Automatic art analysis has been mostly focused on classifying artworks into different artistic styles. However, understanding an artistic representation involves more complex processes, such as identifying the elements in the scene or recognizing author influences. We present SemArt, a multi-modal dataset for semantic art understanding. SemArt is a collection of fine-art painting images in which each image is associated to a number of attributes and a textual artistic comment, such as those that appear in art catalogues or museum collections. To evaluate semantic art understanding, we envisage the Text2Art challenge, a multi-modal retrieval task where relevant paintings are retrieved according to an artistic text, and vice versa. We also propose several models for encoding visual and textual artistic representations into a common semantic space. Our best approach is able to find the correct image within the top 10 ranked images in the 45.5% of the test samples. Moreover, our models show remarkable levels of art understanding when compared against human evaluation.

Original language	English
Title of host publication	Computer Vision – ECCV 2018 Workshops, Proceedings
Editors	Stefan Roth, Laura Leal-Taixé
Publisher	Springer
Pages	676-691
Number of pages	16
Volume	11130
ISBN (Electronic)	978-3-030-11012-3
ISBN (Print)	9783030110116
DOIs	https://doi.org/10.1007/978-3-030-11012-3_52
Publication status	Published - 29 Jan 2019
Event	15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany Duration: 8 Sept 2018 → 14 Sept 2018

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11130 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	15th European Conference on Computer Vision, ECCV 2018
Country/Territory	Germany
City	Munich
Period	8/09/18 → 14/09/18

Keywords

Art analysis
Image-text retrieval
Multi-modal retrieval
Semantic art understanding

Access to Document

10.1007/978-3-030-11012-3_52

Cite this

Garcia, N., & Vogiatzis, G. (2019). How to read paintings: Semantic art understanding with multi-modal retrieval. In S. Roth, & L. Leal-Taixé (Eds.), Computer Vision – ECCV 2018 Workshops, Proceedings (Vol. 11130, pp. 676-691). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11130 LNCS). Springer. https://doi.org/10.1007/978-3-030-11012-3_52

Garcia, Noa ; Vogiatzis, George. / How to read paintings : Semantic art understanding with multi-modal retrieval. Computer Vision – ECCV 2018 Workshops, Proceedings. editor / Stefan Roth ; Laura Leal-Taixé. Vol. 11130 Springer, 2019. pp. 676-691 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{cfc9c29bd0bb41b1a65d3fc25cf60b4f,

title = "How to read paintings: Semantic art understanding with multi-modal retrieval",

abstract = "Automatic art analysis has been mostly focused on classifying artworks into different artistic styles. However, understanding an artistic representation involves more complex processes, such as identifying the elements in the scene or recognizing author influences. We present SemArt, a multi-modal dataset for semantic art understanding. SemArt is a collection of fine-art painting images in which each image is associated to a number of attributes and a textual artistic comment, such as those that appear in art catalogues or museum collections. To evaluate semantic art understanding, we envisage the Text2Art challenge, a multi-modal retrieval task where relevant paintings are retrieved according to an artistic text, and vice versa. We also propose several models for encoding visual and textual artistic representations into a common semantic space. Our best approach is able to find the correct image within the top 10 ranked images in the 45.5% of the test samples. Moreover, our models show remarkable levels of art understanding when compared against human evaluation.",

keywords = "Art analysis, Image-text retrieval, Multi-modal retrieval, Semantic art understanding",

author = "Noa Garcia and George Vogiatzis",

year = "2019",

month = jan,

day = "29",

doi = "10.1007/978-3-030-11012-3_52",

language = "English",

isbn = "9783030110116",

volume = "11130",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "676--691",

editor = "Stefan Roth and Laura Leal-Taix{\'e}",

booktitle = "Computer Vision – ECCV 2018 Workshops, Proceedings",

address = "Germany",

note = "15th European Conference on Computer Vision, ECCV 2018 ; Conference date: 08-09-2018 Through 14-09-2018",

}

Garcia, N & Vogiatzis, G 2019, How to read paintings: Semantic art understanding with multi-modal retrieval. in S Roth & L Leal-Taixé (eds), Computer Vision – ECCV 2018 Workshops, Proceedings. vol. 11130, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11130 LNCS, Springer, pp. 676-691, 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, 8/09/18. https://doi.org/10.1007/978-3-030-11012-3_52

How to read paintings: Semantic art understanding with multi-modal retrieval. / Garcia, Noa; Vogiatzis, George.
Computer Vision – ECCV 2018 Workshops, Proceedings. ed. / Stefan Roth; Laura Leal-Taixé. Vol. 11130 Springer, 2019. p. 676-691 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11130 LNCS).

Research output: Chapter in Book/Published conference output › Conference publication

TY - GEN

T1 - How to read paintings

T2 - 15th European Conference on Computer Vision, ECCV 2018

AU - Garcia, Noa

AU - Vogiatzis, George

PY - 2019/1/29

Y1 - 2019/1/29

N2 - Automatic art analysis has been mostly focused on classifying artworks into different artistic styles. However, understanding an artistic representation involves more complex processes, such as identifying the elements in the scene or recognizing author influences. We present SemArt, a multi-modal dataset for semantic art understanding. SemArt is a collection of fine-art painting images in which each image is associated to a number of attributes and a textual artistic comment, such as those that appear in art catalogues or museum collections. To evaluate semantic art understanding, we envisage the Text2Art challenge, a multi-modal retrieval task where relevant paintings are retrieved according to an artistic text, and vice versa. We also propose several models for encoding visual and textual artistic representations into a common semantic space. Our best approach is able to find the correct image within the top 10 ranked images in the 45.5% of the test samples. Moreover, our models show remarkable levels of art understanding when compared against human evaluation.

AB - Automatic art analysis has been mostly focused on classifying artworks into different artistic styles. However, understanding an artistic representation involves more complex processes, such as identifying the elements in the scene or recognizing author influences. We present SemArt, a multi-modal dataset for semantic art understanding. SemArt is a collection of fine-art painting images in which each image is associated to a number of attributes and a textual artistic comment, such as those that appear in art catalogues or museum collections. To evaluate semantic art understanding, we envisage the Text2Art challenge, a multi-modal retrieval task where relevant paintings are retrieved according to an artistic text, and vice versa. We also propose several models for encoding visual and textual artistic representations into a common semantic space. Our best approach is able to find the correct image within the top 10 ranked images in the 45.5% of the test samples. Moreover, our models show remarkable levels of art understanding when compared against human evaluation.

KW - Art analysis

KW - Image-text retrieval

KW - Multi-modal retrieval

KW - Semantic art understanding

UR - http://www.scopus.com/inward/record.url?scp=85061817245&partnerID=8YFLogxK

UR - https://link.springer.com/chapter/10.1007%2F978-3-030-11012-3_52

U2 - 10.1007/978-3-030-11012-3_52

DO - 10.1007/978-3-030-11012-3_52

M3 - Conference publication

AN - SCOPUS:85061817245

SN - 9783030110116

VL - 11130

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 676

EP - 691

BT - Computer Vision – ECCV 2018 Workshops, Proceedings

A2 - Roth, Stefan

A2 - Leal-Taixé, Laura

PB - Springer

Y2 - 8 September 2018 through 14 September 2018

ER -

Garcia N, Vogiatzis G. How to read paintings: Semantic art understanding with multi-modal retrieval. In Roth S, Leal-Taixé L, editors, Computer Vision – ECCV 2018 Workshops, Proceedings. Vol. 11130. Springer. 2019. p. 676-691. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-11012-3_52