Automatic labelling of topic models learned from Twitter by summarisation

Elizabeth Cano; Yulan He

Automatic labelling of topic models learned from Twitter by summarisation

Elizabeth Cano, Yulan He

Computer Science Research Group

Research output: Chapter in Book/Published conference output › Conference publication

Abstract

Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel events happening in the real world. Existing automatic topic labelling approaches which depend on external knowledge sources become less applicable here since relevant articles/concepts of the extracted topics may not exist in external sources. In this paper we propose to address the problem of automatic labelling of latent topics learned from Twitter as a summarisation problem. We introduce a framework which apply summarisation algorithms to generate topic labels. These algorithms are independent of external sources and only rely on the identification of dominant terms in documents related to the latent topic. We compare the efficiency of existing state of the art summarisation algorithms. Our results suggest that summarisation algorithms generate better topic labels which capture event-related context compared to the top-n terms returned by LDA.

Original language	English
Title of host publication	52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference
Publisher	Association for Computational Linguistics
Pages	618-624
Number of pages	7
Volume	2
ISBN (Print)	978-1-937284-73-2
Publication status	Published - 2014
Event	52nd annual meeting of the Association for Computational Linguistics - Baltimore, MD, United States Duration: 22 Jun 2014 → 27 Jun 2014

Meeting

Meeting	52nd annual meeting of the Association for Computational Linguistics
Abbreviated title	ACL 2014
Country/Territory	United States
City	Baltimore, MD
Period	22/06/14 → 27/06/14

Cite this

@inproceedings{541649b255604b8da18fa34dda08a663,

title = "Automatic labelling of topic models learned from Twitter by summarisation",

abstract = "Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel events happening in the real world. Existing automatic topic labelling approaches which depend on external knowledge sources become less applicable here since relevant articles/concepts of the extracted topics may not exist in external sources. In this paper we propose to address the problem of automatic labelling of latent topics learned from Twitter as a summarisation problem. We introduce a framework which apply summarisation algorithms to generate topic labels. These algorithms are independent of external sources and only rely on the identification of dominant terms in documents related to the latent topic. We compare the efficiency of existing state of the art summarisation algorithms. Our results suggest that summarisation algorithms generate better topic labels which capture event-related context compared to the top-n terms returned by LDA.",

author = "Elizabeth Cano and Yulan He",

year = "2014",

language = "English",

isbn = "978-1-937284-73-2",

volume = "2",

pages = "618--624",

booktitle = "52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference",

publisher = "Association for Computational Linguistics",

note = "52nd annual meeting of the Association for Computational Linguistics, ACL 2014 ; Conference date: 22-06-2014 Through 27-06-2014",

}

Cano, E & He, Y 2014, Automatic labelling of topic models learned from Twitter by summarisation. in 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference. vol. 2, Association for Computational Linguistics, pp. 618-624, 52nd annual meeting of the Association for Computational Linguistics, Baltimore, MD, United States, 22/06/14.

Automatic labelling of topic models learned from Twitter by summarisation. / Cano, Elizabeth; He, Yulan.
52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference. Vol. 2 Association for Computational Linguistics, 2014. p. 618-624.

Research output: Chapter in Book/Published conference output › Conference publication

TY - GEN

T1 - Automatic labelling of topic models learned from Twitter by summarisation

AU - Cano, Elizabeth

AU - He, Yulan

PY - 2014

Y1 - 2014

N2 - Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel events happening in the real world. Existing automatic topic labelling approaches which depend on external knowledge sources become less applicable here since relevant articles/concepts of the extracted topics may not exist in external sources. In this paper we propose to address the problem of automatic labelling of latent topics learned from Twitter as a summarisation problem. We introduce a framework which apply summarisation algorithms to generate topic labels. These algorithms are independent of external sources and only rely on the identification of dominant terms in documents related to the latent topic. We compare the efficiency of existing state of the art summarisation algorithms. Our results suggest that summarisation algorithms generate better topic labels which capture event-related context compared to the top-n terms returned by LDA.

AB - Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel events happening in the real world. Existing automatic topic labelling approaches which depend on external knowledge sources become less applicable here since relevant articles/concepts of the extracted topics may not exist in external sources. In this paper we propose to address the problem of automatic labelling of latent topics learned from Twitter as a summarisation problem. We introduce a framework which apply summarisation algorithms to generate topic labels. These algorithms are independent of external sources and only rely on the identification of dominant terms in documents related to the latent topic. We compare the efficiency of existing state of the art summarisation algorithms. Our results suggest that summarisation algorithms generate better topic labels which capture event-related context compared to the top-n terms returned by LDA.

UR - http://www.scopus.com/inward/record.url?scp=84906922607&partnerID=8YFLogxK

M3 - Conference publication

AN - SCOPUS:84906922607

SN - 978-1-937284-73-2

VL - 2

SP - 618

EP - 624

BT - 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference

PB - Association for Computational Linguistics

T2 - 52nd annual meeting of the Association for Computational Linguistics

Y2 - 22 June 2014 through 27 June 2014

ER -

Automatic labelling of topic models learned from Twitter by summarisation

Abstract

Meeting

Other files and links

Fingerprint

A simple Bayesian modelling approach to event extraction from Twitter

Real-time detection, tracking, and monitoring of automatically discovered events in social media

Cite this