A weakly-supervised Bayesian model for violence detection from social media

Elizabeth Cano; Yulan He; Kang Liu; Jun Zhao

A weakly-supervised Bayesian model for violence detection from social media

Elizabeth Cano, Yulan He, Kang Liu, Jun Zhao

Computer Science Research Group

Research output: Chapter in Book/Published conference output › Conference publication

Abstract

Social streams have proven to be the mostup-to-date and inclusive information on cur-rent events. In this paper we propose a novelprobabilistic modelling framework, called violence detection model (VDM), which enables the identiﬁcation of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the in-corporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the in-tuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classiﬁcation results and produces more coherent violence-related topics compared toa few competitive baselines.

Original language	English
Title of host publication	The 6th International Joint Conference on Natural Language Processing (IJCNLP)
Place of Publication	Nagoya (JP)
Pages	109-117
Number of pages	9
Publication status	Published - 2013
Event	6th International Joint Conference on Natural Language Processing - Nagoya, Japan Duration: 14 Oct 2013 → 18 Oct 2013

Conference

Conference	6th International Joint Conference on Natural Language Processing
Abbreviated title	IJCNLP 2013
Country/Territory	Japan
City	Nagoya
Period	14/10/13 → 18/10/13

Cite this

@inproceedings{940599063f214e6b8f0c53701835d6bf,

title = "A weakly-supervised Bayesian model for violence detection from social media",

abstract = "Social streams have proven to be the mostup-to-date and inclusive information on cur-rent events. In this paper we propose a novelprobabilistic modelling framework, called violence detection model (VDM), which enables the identiﬁcation of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the in-corporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the in-tuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classiﬁcation results and produces more coherent violence-related topics compared toa few competitive baselines.",

author = "Elizabeth Cano and Yulan He and Kang Liu and Jun Zhao",

year = "2013",

language = "English",

pages = "109--117",

booktitle = "The 6th International Joint Conference on Natural Language Processing (IJCNLP)",

note = "6th International Joint Conference on Natural Language Processing, IJCNLP 2013 ; Conference date: 14-10-2013 Through 18-10-2013",

}

TY - GEN

T1 - A weakly-supervised Bayesian model for violence detection from social media

AU - Cano, Elizabeth

AU - He, Yulan

AU - Liu, Kang

AU - Zhao, Jun

PY - 2013

Y1 - 2013

N2 - Social streams have proven to be the mostup-to-date and inclusive information on cur-rent events. In this paper we propose a novelprobabilistic modelling framework, called violence detection model (VDM), which enables the identiﬁcation of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the in-corporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the in-tuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classiﬁcation results and produces more coherent violence-related topics compared toa few competitive baselines.

AB - Social streams have proven to be the mostup-to-date and inclusive information on cur-rent events. In this paper we propose a novelprobabilistic modelling framework, called violence detection model (VDM), which enables the identiﬁcation of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the in-corporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the in-tuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classiﬁcation results and produces more coherent violence-related topics compared toa few competitive baselines.

M3 - Conference publication

SP - 109

EP - 117

BT - The 6th International Joint Conference on Natural Language Processing (IJCNLP)

CY - Nagoya (JP)

T2 - 6th International Joint Conference on Natural Language Processing

Y2 - 14 October 2013 through 18 October 2013

ER -

A weakly-supervised Bayesian model for violence detection from social media

Abstract

Conference

Fingerprint

Cite this