Research into cyberbullying detection has increased in recent years, due in part to the proliferation of cyberbullying across social media and its detrimental effect on young people. A growing body of work is emerging on automated approaches to cyberbullying detection. These approaches utilise machine learning and natural language processing techniques to identify the characteristics of a cyberbullying exchange and automatically detect cyberbullying by matching textual data to the identified traits. In this paper, we present a systematic review of published research (as identified via Scopus, ACM and IEEE Xplore bibliographic databases) on cyberbullying detection approaches. On the basis of our extensive literature review, we categorise existing approaches into 4 main classes, namely; supervised learning, lexicon based, rule based and mixed-initiative approaches. Supervised learning-based approaches typically use classifiers such as SVM and Naïve Bayes to develop predictive models for cyberbullying detection. Lexicon based systems utilise word lists and use the presence of words within the lists to detect cyberbullying. Rules-based approaches match text to predefined rules to identify bullying and mixed-initiatives approaches combine human-based reasoning with one or more of the aforementioned approaches. We found lack of quality representative labelled datasets and non-holistic consideration of cyberbullying by researchers when developing detection systems are two key challenges facing cyberbullying detection research. This paper essentially maps out the state-of-the-art in cyberbullying detection research and serves as a resource for researchers to determine where to best direct their future research efforts in this field.
|Number of pages||22|
|Journal||IEEE Transactions on Affective Computing|
|Early online date||10 Oct 2017|
|Publication status||Published - 1 Mar 2020|
Bibliographical note© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
- Abuse and crime involving computers
- data mining
- Electronic mail
- machine learning
- natural language processing
- Sentiment analysis
- sentiment analysis
- Social network services
- social networking
- supervised learning