Abstract
In this paper, we introduce a new English Twitter-based dataset for online abuse and cyberbullying detection. Comprising 62,587 tweets, this dataset was sourced from Twitter using specific query terms designed to retrieve tweets with high probabilities of various forms of bullying and offensive content, including insult, profanity, sarcasm, threat, porn and exclusion. Analysis performed on the dataset confirmed common cyberbullying themes reported by other studies and revealed interesting relationships between the classes. The dataset was used to train a number of transformer-based deep learning models returning impressive results.
Original language | English |
---|---|
Title of host publication | Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) |
Editors | Aida Mostafazedeh Davani, Douwe Kiela, Mathias Lambert, Bertie Vidgen, Vinodkumar Prabhakaran, Zeerak Waseem |
Publisher | Association for Computational Linguistics |
Pages | 146-156 |
Number of pages | 11 |
ISBN (Print) | 9781954085596 |
DOIs | |
Publication status | Published - Aug 2021 |
Event | The 5th Workshop on Online Abuse and Harms - Duration: 6 Aug 2021 → 6 Aug 2021 https://www.workshopononlineabuse.com/past-workshops/woah-2021-website |
Conference
Conference | The 5th Workshop on Online Abuse and Harms |
---|---|
Abbreviated title | WOAH 2021 |
Period | 6/08/21 → 6/08/21 |
Internet address |