Event extraction from Twitter using non-parametric Bayesian mixture model with word embeddings

Deyu Zhou; Xuan Zhang; Yulan He

Event extraction from Twitter using non-parametric Bayesian mixture model with word embeddings

Deyu Zhou, Xuan Zhang, Yulan He

Computer Science Research Group

Research output: Chapter in Book/Published conference output › Conference publication

Abstract

To extract structured representations of newsworthy events from Twitter, unsupervised models typically assume that tweets involving the same named entities and expressed using similar words are likely to belong to the same event. Hence, they group tweets into clusters based on the co-occurrence patterns of named entities and topical keywords. However, there are two main limitations. First, they require the number of events to be known beforehand, which is not realistic in practical applications. Second, they don’t recognise that the same named entity might be referred to by multiple mentions and tweets using different mentions would be wrongly assigned to different events. To overcome these limitations, we propose a non-parametric Bayesian mixture model with word embeddings for event extraction, in which the number of events can be inferred automatically and the issue of lexical variations for the same named entity can be dealt with properly. Our model has been evaluated on three datasets with sizes ranging between 2,499 and over 60 million tweets. Experimental results show that our model outperforms the baseline approach on all datasets by 5-8% in F-measure.

Original language	English
Title of host publication	Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics
Publisher	Association for Computational Linguistics
Pages	808–817
Number of pages	10
Volume	1
ISBN (Electronic)	978-1-51083860-4
Publication status	Published - 7 Apr 2017
Event	15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Valencia Conference Center, Valencia, Spain Duration: 3 Apr 2017 → 7 Apr 2017

Conference

Conference	15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017
Country/Territory	Spain
City	Valencia
Period	3/04/17 → 7/04/17

Bibliographical note

-

Access to Document

http://aclweb.org/anthology/E17-1076

Cite this

@inproceedings{82cdf3ae68c14afe801e21ba89dc4da6,

title = "Event extraction from Twitter using non-parametric Bayesian mixture model with word embeddings",

abstract = "To extract structured representations of newsworthy events from Twitter, unsupervised models typically assume that tweets involving the same named entities and expressed using similar words are likely to belong to the same event. Hence, they group tweets into clusters based on the co-occurrence patterns of named entities and topical keywords. However, there are two main limitations. First, they require the number of events to be known beforehand, which is not realistic in practical applications. Second, they don{\textquoteright}t recognise that the same named entity might be referred to by multiple mentions and tweets using different mentions would be wrongly assigned to different events. To overcome these limitations, we propose a non-parametric Bayesian mixture model with word embeddings for event extraction, in which the number of events can be inferred automatically and the issue of lexical variations for the same named entity can be dealt with properly. Our model has been evaluated on three datasets with sizes ranging between 2,499 and over 60 million tweets. Experimental results show that our model outperforms the baseline approach on all datasets by 5-8% in F-measure. ",

author = "Deyu Zhou and Xuan Zhang and Yulan He",

note = "-; 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 ; Conference date: 03-04-2017 Through 07-04-2017",

year = "2017",

month = apr,

day = "7",

language = "English",

volume = "1",

pages = "808–817",

booktitle = "Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics",

}

Zhou, D, Zhang, X & He, Y 2017, Event extraction from Twitter using non-parametric Bayesian mixture model with word embeddings. in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. vol. 1, Association for Computational Linguistics, pp. 808–817, 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, 3/04/17. <http://aclweb.org/anthology/E17-1076>

Event extraction from Twitter using non-parametric Bayesian mixture model with word embeddings. / Zhou, Deyu; Zhang, Xuan; He, Yulan.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Vol. 1 Association for Computational Linguistics, 2017. p. 808–817.

Research output: Chapter in Book/Published conference output › Conference publication

TY - GEN

T1 - Event extraction from Twitter using non-parametric Bayesian mixture model with word embeddings

AU - Zhou, Deyu

AU - Zhang, Xuan

AU - He, Yulan

N1 - -

PY - 2017/4/7

Y1 - 2017/4/7

N2 - To extract structured representations of newsworthy events from Twitter, unsupervised models typically assume that tweets involving the same named entities and expressed using similar words are likely to belong to the same event. Hence, they group tweets into clusters based on the co-occurrence patterns of named entities and topical keywords. However, there are two main limitations. First, they require the number of events to be known beforehand, which is not realistic in practical applications. Second, they don’t recognise that the same named entity might be referred to by multiple mentions and tweets using different mentions would be wrongly assigned to different events. To overcome these limitations, we propose a non-parametric Bayesian mixture model with word embeddings for event extraction, in which the number of events can be inferred automatically and the issue of lexical variations for the same named entity can be dealt with properly. Our model has been evaluated on three datasets with sizes ranging between 2,499 and over 60 million tweets. Experimental results show that our model outperforms the baseline approach on all datasets by 5-8% in F-measure.

AB - To extract structured representations of newsworthy events from Twitter, unsupervised models typically assume that tweets involving the same named entities and expressed using similar words are likely to belong to the same event. Hence, they group tweets into clusters based on the co-occurrence patterns of named entities and topical keywords. However, there are two main limitations. First, they require the number of events to be known beforehand, which is not realistic in practical applications. Second, they don’t recognise that the same named entity might be referred to by multiple mentions and tweets using different mentions would be wrongly assigned to different events. To overcome these limitations, we propose a non-parametric Bayesian mixture model with word embeddings for event extraction, in which the number of events can be inferred automatically and the issue of lexical variations for the same named entity can be dealt with properly. Our model has been evaluated on three datasets with sizes ranging between 2,499 and over 60 million tweets. Experimental results show that our model outperforms the baseline approach on all datasets by 5-8% in F-measure.

UR - http://www.scopus.com/inward/record.url?scp=85021649806&partnerID=8YFLogxK

M3 - Conference publication

AN - SCOPUS:85021649806

VL - 1

SP - 808

EP - 817

BT - Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics

PB - Association for Computational Linguistics

T2 - 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017

Y2 - 3 April 2017 through 7 April 2017

ER -

Event extraction from Twitter using non-parametric Bayesian mixture model with word embeddings

Abstract

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this