What Size of Language Unit Is More Appropriate for Text Summarization?

Mengyun Cao, Hai Zhuge

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Extractive text summarization is to find the important sentences from texts and concatenates these sentences as a summary. However, sentences selected according to ranking rules are usually not coherent. Is a larger language unit such as a group of sentences or a paragraph more appropriate to be selected for summarization? This paper is to answer this question. Investigating the summarization algorithm based on ranking semantic link networks of texts, we find the following three results: 1) comparing with the summaries composed by sentences, the summaries composed by larger language units have similar ROUGE scores but have better readability; 2) using a group of sentences is more effective than using sentence and paragraph; and, 3) the quality of summaries composed by group becomes better when the average length of the source texts increases.
Original languageEnglish
Title of host publicationProceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018
PublisherIEEE
Pages196-202
Number of pages7
ISBN (Electronic)978-1-7281-0441-6
ISBN (Print)978-1-7281-0442-3
DOIs
Publication statusPublished - 2 May 2019
Event2018 14th International Conference on Semantics, Knowledge and Grids (SKG) - Guangzhou, China
Duration: 12 Sep 201814 Sep 2018

Publication series

Name2018 14th International Conference on Semantics, Knowledge and Grids (SKG)
PublisherIEEE
ISSN (Electronic)2325-0623

Conference

Conference2018 14th International Conference on Semantics, Knowledge and Grids (SKG)
Period12/09/1814/09/18

Fingerprint

Semantics

Keywords

  • ranking
  • semantic link network
  • text summarization

Cite this

Cao, M., & Zhuge, H. (2019). What Size of Language Unit Is More Appropriate for Text Summarization? In Proceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018 (pp. 196-202). [8703948] (2018 14th International Conference on Semantics, Knowledge and Grids (SKG)). IEEE. https://doi.org/10.1109/SKG.2018.00036
Cao, Mengyun ; Zhuge, Hai. / What Size of Language Unit Is More Appropriate for Text Summarization?. Proceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018. IEEE, 2019. pp. 196-202 (2018 14th International Conference on Semantics, Knowledge and Grids (SKG)).
@inproceedings{ded8fdcb8017408098fa363f7255db4a,
title = "What Size of Language Unit Is More Appropriate for Text Summarization?",
abstract = "Extractive text summarization is to find the important sentences from texts and concatenates these sentences as a summary. However, sentences selected according to ranking rules are usually not coherent. Is a larger language unit such as a group of sentences or a paragraph more appropriate to be selected for summarization? This paper is to answer this question. Investigating the summarization algorithm based on ranking semantic link networks of texts, we find the following three results: 1) comparing with the summaries composed by sentences, the summaries composed by larger language units have similar ROUGE scores but have better readability; 2) using a group of sentences is more effective than using sentence and paragraph; and, 3) the quality of summaries composed by group becomes better when the average length of the source texts increases.",
keywords = "ranking, semantic link network, text summarization",
author = "Mengyun Cao and Hai Zhuge",
year = "2019",
month = "5",
day = "2",
doi = "10.1109/SKG.2018.00036",
language = "English",
isbn = "978-1-7281-0442-3",
series = "2018 14th International Conference on Semantics, Knowledge and Grids (SKG)",
publisher = "IEEE",
pages = "196--202",
booktitle = "Proceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018",
address = "United States",

}

Cao, M & Zhuge, H 2019, What Size of Language Unit Is More Appropriate for Text Summarization? in Proceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018., 8703948, 2018 14th International Conference on Semantics, Knowledge and Grids (SKG), IEEE, pp. 196-202, 2018 14th International Conference on Semantics, Knowledge and Grids (SKG), 12/09/18. https://doi.org/10.1109/SKG.2018.00036

What Size of Language Unit Is More Appropriate for Text Summarization? / Cao, Mengyun; Zhuge, Hai.

Proceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018. IEEE, 2019. p. 196-202 8703948 (2018 14th International Conference on Semantics, Knowledge and Grids (SKG)).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - What Size of Language Unit Is More Appropriate for Text Summarization?

AU - Cao, Mengyun

AU - Zhuge, Hai

PY - 2019/5/2

Y1 - 2019/5/2

N2 - Extractive text summarization is to find the important sentences from texts and concatenates these sentences as a summary. However, sentences selected according to ranking rules are usually not coherent. Is a larger language unit such as a group of sentences or a paragraph more appropriate to be selected for summarization? This paper is to answer this question. Investigating the summarization algorithm based on ranking semantic link networks of texts, we find the following three results: 1) comparing with the summaries composed by sentences, the summaries composed by larger language units have similar ROUGE scores but have better readability; 2) using a group of sentences is more effective than using sentence and paragraph; and, 3) the quality of summaries composed by group becomes better when the average length of the source texts increases.

AB - Extractive text summarization is to find the important sentences from texts and concatenates these sentences as a summary. However, sentences selected according to ranking rules are usually not coherent. Is a larger language unit such as a group of sentences or a paragraph more appropriate to be selected for summarization? This paper is to answer this question. Investigating the summarization algorithm based on ranking semantic link networks of texts, we find the following three results: 1) comparing with the summaries composed by sentences, the summaries composed by larger language units have similar ROUGE scores but have better readability; 2) using a group of sentences is more effective than using sentence and paragraph; and, 3) the quality of summaries composed by group becomes better when the average length of the source texts increases.

KW - ranking

KW - semantic link network

KW - text summarization

UR - https://ieeexplore.ieee.org/document/8703948/

UR - http://www.scopus.com/inward/record.url?scp=85065794921&partnerID=8YFLogxK

U2 - 10.1109/SKG.2018.00036

DO - 10.1109/SKG.2018.00036

M3 - Conference contribution

SN - 978-1-7281-0442-3

T3 - 2018 14th International Conference on Semantics, Knowledge and Grids (SKG)

SP - 196

EP - 202

BT - Proceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018

PB - IEEE

ER -

Cao M, Zhuge H. What Size of Language Unit Is More Appropriate for Text Summarization? In Proceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018. IEEE. 2019. p. 196-202. 8703948. (2018 14th International Conference on Semantics, Knowledge and Grids (SKG)). https://doi.org/10.1109/SKG.2018.00036