Self-training from labeled features for sentiment analysis

Research output: Contribution to journalArticle

View graph of relations Save citation

Open

Authors

Research units

Abstract

Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort. In this paper, we propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie-review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labeled documents.

Documents

  • Self-training from labeled features

    Rights statement: NOTICE: this is the author’s version of a work that was accepted for publication in Information processing and management. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in He, Y & Zhou, D, 'Self-training from labeled features for sentiment analysis' Information processing and management, vol 47, no. 4 (2011) DOI http://dx.doi.org/10.1016/j.ipm.2010.11.003 .

    Accepted author manuscript, 500 KB, PDF-document

Details

Original languageEnglish
Pages (from-to)606-616
Number of pages11
JournalInformation Processing and Management
Volume47
Issue4
DOIs
StatePublished - Jul 2011

Bibliographic note

NOTICE: this is the author’s version of a work that was accepted for publication in Information processing and management. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in He, Y & Zhou, D, 'Self-training from labeled features for sentiment analysis' Information processing and management, vol 47, no. 4 (2011) DOI http://dx.doi.org/10.1016/j.ipm.2010.11.003.

    Keywords

  • sentiment analysis, opinion mining, self-training, generalized expectation, self-learned features

DOI

Download statistics

No data available

Employable Graduates; Exploitable Research

Copy the text from this field...