Data Expansion Using WordNet-based Semantic Expansion and Word Disambiguation for Cyberbullying Detection

Md Saroar Jahan, Mourad Oussalah, Muhidin Mohamed

Research output: Chapter in Book/Published conference outputConference publication

Abstract

Automatic identification of cyberbullying from textual content is known to be a challenging task. The challenges arise from the inherent
structure of cyberbullying and the lack of labeled large-scale corpus, enabling efficient machine-learning-based tools including neural
networks. This paper advocates a data augmentation-based approach that could enhance the automatic detection of cyberbullying in
social media texts. We use both word sense disambiguation and synonymy relation in WordNet lexical database to generate coherent
equivalent utterances of cyberbullying input data. The disambiguation and semantic expansion are intended to overcome the inherent
limitations of social media posts, such as an abundance of unstructured constructs and limited semantic content. Besides, to test the
feasibility, a novel protocol has been employed to collect cyberbullying traces data from AskFm forum, where about a 10K-size dataset
has been manually labeled. Next, the problem of cyberbullying identification is viewed as a binary classification problem using an
elaborated data augmentation strategy and an appropriate classifier. For the latter, a Convolutional Neural Network (CNN) architecture
with FastText and BERT was put forward, whose results were compared against commonly employed Na¨ıve Bayes (NB) and Logistic
Regression (LR) classifiers with and without data augmentation. The research outcomes were promising and yielded almost 98.4% of
classifier accuracy, an improvement of more than 4% over baseline results.
Original languageEnglish
Title of host publicationData Expansion Using WordNet-based Semantic Expansion and Word Disambiguation for Cyberbullying Detection
Pages 1761–1770
Number of pages10
Publication statusPublished - 20 Jun 2022
Event13th Conference on Language Resources and Evaluation (LREC 2022) - Marseille, France
Duration: 20 Jun 202225 Jun 2022

Conference

Conference13th Conference on Language Resources and Evaluation (LREC 2022)
Country/TerritoryFrance
CityMarseille
Period20/06/2225/06/22

Bibliographical note

© European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0

Fingerprint

Dive into the research topics of 'Data Expansion Using WordNet-based Semantic Expansion and Word Disambiguation for Cyberbullying Detection'. Together they form a unique fingerprint.

Cite this