Abstract
Rapid growth of multi-modal documents on the Internet makes multi-modal summarization research necessary. Most previous research summarizes texts or images separately. Recent neural summarization research shows the strength of the Encoder-Decoder model in text summarization. This paper proposes an abstractive text-image summarization model using the attentional hierarchical Encoder-Decoder model to summarize a text document and its accompanying images simultaneously, and then to align the sentences and images in summaries. A multi-modal attentional mechanism is proposed to attend original sentences, images, and captions when decoding. The DailyMail dataset is extended by collecting images and captions from the Web. Experiments show our model outperforms the neural abstractive and extractive text summarization methods that do not consider images. In addition, our model can generate informative summaries of images.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
| Editors | Ellen Riloff, David Chiang, Julia Hockenmaier, Jun'ichi Tsujii |
| Publisher | Association for Computational Linguistics |
| Pages | 4046-4056 |
| Number of pages | 11 |
| ISBN (Electronic) | 9781948087841 |
| Publication status | Published - 1 Jan 2020 |
| Event | 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 - Brussels, Belgium Duration: 31 Oct 2018 → 4 Nov 2018 |
Publication series
| Name | Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
|---|
Conference
| Conference | 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
|---|---|
| Country/Territory | Belgium |
| City | Brussels |
| Period | 31/10/18 → 4/11/18 |
Funding
The research was sponsored by the National Natural Science Foundation of China (No.61806101, No.61876048) and the Natural Science Foundation of Jiangsu Province (BK20150862). We thank the anonymous reviewers for helpful comments. Professor Hai Zhuge is the corresponding author.
Fingerprint
Dive into the research topics of 'Abstractive text-image summarization using multi-modal attentional hierarchical RNN'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver