Content-Based Conflict-of-Interest Detection on Wikipedia

Udochukwu Orizu, Yulan He

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Wikipedia is one of the most visited websites in the world. On Wikipedia, Conflict-of-Interest (CoI) editing happens when an editor uses Wikipedia to advance their interests or relationships. This includes paid editing done by organisations for public relations purposes, etc. CoI detection is highly subjective and though closely related to vandalism and bias detection, it is a more difficult problem. In this paper, we frame CoI detection as a binary classification problem and explore various features which can be used to train supervised classifiers for CoI detection on Wikipedia articles. Our experimental results show that the best F-measure achieved is 0.67 by training SVM from a combination of features including stylometric, bias and emotion features. As we are not certain that our non-CoI set does not contain any CoI articles, we have also explored the use of one-class classification for CoI detection. The results show that using stylometric features outperforms other types of features or a combination of them and gives an F-measure of 0.63. Also, while binary classifiers give higher recall values (0.81∼0.94), one-class classifier attains higher precision values (0.69∼0.74)
Original languageEnglish
Title of host publicationThe 11th International Conference on Language Resources and Evaluation (LREC)
Pages166-173
Publication statusPublished - 8 May 2018

Fingerprint

Classifiers
Public relations
Websites

Keywords

  • Wikipedia
  • Conflict-of-Interest Detection
  • Bias
  • Stylometric features
  • One-class classification

Cite this

Orizu, U., & He, Y. (2018). Content-Based Conflict-of-Interest Detection on Wikipedia. In The 11th International Conference on Language Resources and Evaluation (LREC) (pp. 166-173)
Orizu, Udochukwu ; He, Yulan. / Content-Based Conflict-of-Interest Detection on Wikipedia. The 11th International Conference on Language Resources and Evaluation (LREC). 2018. pp. 166-173
@inproceedings{4d2cd95d54184a8fbad4386379e3b105,
title = "Content-Based Conflict-of-Interest Detection on Wikipedia",
abstract = "Wikipedia is one of the most visited websites in the world. On Wikipedia, Conflict-of-Interest (CoI) editing happens when an editor uses Wikipedia to advance their interests or relationships. This includes paid editing done by organisations for public relations purposes, etc. CoI detection is highly subjective and though closely related to vandalism and bias detection, it is a more difficult problem. In this paper, we frame CoI detection as a binary classification problem and explore various features which can be used to train supervised classifiers for CoI detection on Wikipedia articles. Our experimental results show that the best F-measure achieved is 0.67 by training SVM from a combination of features including stylometric, bias and emotion features. As we are not certain that our non-CoI set does not contain any CoI articles, we have also explored the use of one-class classification for CoI detection. The results show that using stylometric features outperforms other types of features or a combination of them and gives an F-measure of 0.63. Also, while binary classifiers give higher recall values (0.81∼0.94), one-class classifier attains higher precision values (0.69∼0.74)",
keywords = "Wikipedia, Conflict-of-Interest Detection, Bias, Stylometric features, One-class classification",
author = "Udochukwu Orizu and Yulan He",
year = "2018",
month = "5",
day = "8",
language = "English",
isbn = "979-109554600-9",
pages = "166--173",
booktitle = "The 11th International Conference on Language Resources and Evaluation (LREC)",

}

Orizu, U & He, Y 2018, Content-Based Conflict-of-Interest Detection on Wikipedia. in The 11th International Conference on Language Resources and Evaluation (LREC). pp. 166-173.

Content-Based Conflict-of-Interest Detection on Wikipedia. / Orizu, Udochukwu; He, Yulan.

The 11th International Conference on Language Resources and Evaluation (LREC). 2018. p. 166-173.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Content-Based Conflict-of-Interest Detection on Wikipedia

AU - Orizu, Udochukwu

AU - He, Yulan

PY - 2018/5/8

Y1 - 2018/5/8

N2 - Wikipedia is one of the most visited websites in the world. On Wikipedia, Conflict-of-Interest (CoI) editing happens when an editor uses Wikipedia to advance their interests or relationships. This includes paid editing done by organisations for public relations purposes, etc. CoI detection is highly subjective and though closely related to vandalism and bias detection, it is a more difficult problem. In this paper, we frame CoI detection as a binary classification problem and explore various features which can be used to train supervised classifiers for CoI detection on Wikipedia articles. Our experimental results show that the best F-measure achieved is 0.67 by training SVM from a combination of features including stylometric, bias and emotion features. As we are not certain that our non-CoI set does not contain any CoI articles, we have also explored the use of one-class classification for CoI detection. The results show that using stylometric features outperforms other types of features or a combination of them and gives an F-measure of 0.63. Also, while binary classifiers give higher recall values (0.81∼0.94), one-class classifier attains higher precision values (0.69∼0.74)

AB - Wikipedia is one of the most visited websites in the world. On Wikipedia, Conflict-of-Interest (CoI) editing happens when an editor uses Wikipedia to advance their interests or relationships. This includes paid editing done by organisations for public relations purposes, etc. CoI detection is highly subjective and though closely related to vandalism and bias detection, it is a more difficult problem. In this paper, we frame CoI detection as a binary classification problem and explore various features which can be used to train supervised classifiers for CoI detection on Wikipedia articles. Our experimental results show that the best F-measure achieved is 0.67 by training SVM from a combination of features including stylometric, bias and emotion features. As we are not certain that our non-CoI set does not contain any CoI articles, we have also explored the use of one-class classification for CoI detection. The results show that using stylometric features outperforms other types of features or a combination of them and gives an F-measure of 0.63. Also, while binary classifiers give higher recall values (0.81∼0.94), one-class classifier attains higher precision values (0.69∼0.74)

KW - Wikipedia

KW - Conflict-of-Interest Detection

KW - Bias

KW - Stylometric features

KW - One-class classification

UR - http://lrec2018.lrec-conf.org/en/

M3 - Conference contribution

SN - 979-109554600-9

SP - 166

EP - 173

BT - The 11th International Conference on Language Resources and Evaluation (LREC)

ER -

Orizu U, He Y. Content-Based Conflict-of-Interest Detection on Wikipedia. In The 11th International Conference on Language Resources and Evaluation (LREC). 2018. p. 166-173