Target-Based Offensive Language Identification

Marcos Zampieri, Skye Morgan, Kai North, Tharindu Ranasinghe, Austin Simmons, Paridhi Khandelwal, Sara Rosenthal, Preslav Nakov

Research output: Chapter in Book/Published conference outputConference publication

Abstract

We present TBO, a new dataset for Target-based Offensive language identification. TBO contains post-level annotations regarding the harmfulness of an offensive post and token-level annotations comprising of the target and the offensive argument expression. Popular offensive language identification datasets for social media focus on annotation taxonomies only at the post level and more recently, some datasets have been released that feature only token-level annotations. TBO is an important resource that bridges the gap between post-level and token-level annotation datasets by introducing a single comprehensive unified annotation taxonomy. We use the TBO taxonomy to annotate post-level and token-level offensive language on English Twitter posts. We release an initial dataset of over 4,500 instances collected from Twitter and we carry out multiple experiments to compare the performance of different models trained and tested on TBO.

Original languageEnglish
Title of host publicationProceedings of the 61st Annual Meeting of the Association for Computational Linguistics
Subtitle of host publication (Volume 2: Short Papers)
PublisherAssociation for Computational Linguistics (ACL)
Pages762-770
Number of pages9
Volume2
ISBN (Electronic)9781959429715
DOIs
Publication statusPublished - 9 Jul 2023
Event61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Duration: 9 Jul 202314 Jul 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume2
ISSN (Print)0736-587X

Conference

Conference61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityToronto
Period9/07/2314/07/23

Bibliographical note

Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.

Fingerprint

Dive into the research topics of 'Target-Based Offensive Language Identification'. Together they form a unique fingerprint.

Cite this