TY - GEN
T1 - Target-Based Offensive Language Identification
AU - Zampieri, Marcos
AU - Morgan, Skye
AU - North, Kai
AU - Ranasinghe, Tharindu
AU - Simmons, Austin
AU - Khandelwal, Paridhi
AU - Rosenthal, Sara
AU - Nakov, Preslav
N1 - Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
PY - 2023/7/9
Y1 - 2023/7/9
N2 - We present TBO, a new dataset for Target-based Offensive language identification. TBO contains post-level annotations regarding the harmfulness of an offensive post and token-level annotations comprising of the target and the offensive argument expression. Popular offensive language identification datasets for social media focus on annotation taxonomies only at the post level and more recently, some datasets have been released that feature only token-level annotations. TBO is an important resource that bridges the gap between post-level and token-level annotation datasets by introducing a single comprehensive unified annotation taxonomy. We use the TBO taxonomy to annotate post-level and token-level offensive language on English Twitter posts. We release an initial dataset of over 4,500 instances collected from Twitter and we carry out multiple experiments to compare the performance of different models trained and tested on TBO.
AB - We present TBO, a new dataset for Target-based Offensive language identification. TBO contains post-level annotations regarding the harmfulness of an offensive post and token-level annotations comprising of the target and the offensive argument expression. Popular offensive language identification datasets for social media focus on annotation taxonomies only at the post level and more recently, some datasets have been released that feature only token-level annotations. TBO is an important resource that bridges the gap between post-level and token-level annotation datasets by introducing a single comprehensive unified annotation taxonomy. We use the TBO taxonomy to annotate post-level and token-level offensive language on English Twitter posts. We release an initial dataset of over 4,500 instances collected from Twitter and we carry out multiple experiments to compare the performance of different models trained and tested on TBO.
UR - http://www.scopus.com/inward/record.url?scp=85172243992&partnerID=8YFLogxK
UR - https://aclanthology.org/2023.acl-short.66/
U2 - 10.18653/v1/2023.acl-short.66
DO - 10.18653/v1/2023.acl-short.66
M3 - Conference publication
AN - SCOPUS:85172243992
VL - 2
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 762
EP - 770
BT - Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Y2 - 9 July 2023 through 14 July 2023
ER -