OffensEval 2023: Offensive language identification in the age of Large Language Models

Marcos Zampieri; Sara Rosenthal; Preslav Nakov; Alphaeus Dmonte; Tharindu Ranasinghe

doi:10.1017/S1351324923000517

OffensEval 2023: Offensive language identification in the age of Large Language Models

Marcos Zampieri, Sara Rosenthal, Preslav Nakov, Alphaeus Dmonte, Tharindu Ranasinghe

Research output: Contribution to journal › Review article › peer-review

Abstract

The OffensEval shared tasks organized as part of SemEval-2019–2020 were very popular, attracting over 1300 participating teams. The two editions of the shared task helped advance the state of the art in offensive language identification by providing the community with benchmark datasets in Arabic, Danish, English, Greek, and Turkish. The datasets were annotated using the OLID hierarchical taxonomy, which since then has become the de facto standard in general offensive language identification research and was widely used beyond OffensEval. We present a survey of OffensEval and related competitions, and we discuss the main lessons learned. We further evaluate the performance of Large Language Models (LLMs), which have recently revolutionalized the field of Natural Language Processing. We use zero-shot prompting with six popular LLMs and zero-shot learning with two task-specific fine-tuned BERT models, and we compare the results against those of the top-performing teams at the OffensEval competitions. Our results show that while some LMMs such as Flan-T5 achieve competitive performance, in general LLMs lag behind the best OffensEval systems.

Original language	English
Pages (from-to)	1416 - 1435
Number of pages	20
Journal	Natural Language Engineering
Volume	29
Issue number	6
DOIs	https://doi.org/10.1017/S1351324923000517
Publication status	Published - 6 Dec 2023

Bibliographical note

© The Author(s), 2023. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited

Keywords

Machine learning
Text classification

Access to Document

10.1017/S1351324923000517Licence: CC BY 4.0

offenseval-2023-offensive-language-identification-in-the-age-of-large-language-models
© The Author(s), 2023. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited
Final published version, 401 KBLicence: CC BY 4.0

Cite this

@article{4863d0e06b7e4f7490253d6677d68dca,

title = "OffensEval 2023: Offensive language identification in the age of Large Language Models",

abstract = "The OffensEval shared tasks organized as part of SemEval-2019–2020 were very popular, attracting over 1300 participating teams. The two editions of the shared task helped advance the state of the art in offensive language identification by providing the community with benchmark datasets in Arabic, Danish, English, Greek, and Turkish. The datasets were annotated using the OLID hierarchical taxonomy, which since then has become the de facto standard in general offensive language identification research and was widely used beyond OffensEval. We present a survey of OffensEval and related competitions, and we discuss the main lessons learned. We further evaluate the performance of Large Language Models (LLMs), which have recently revolutionalized the field of Natural Language Processing. We use zero-shot prompting with six popular LLMs and zero-shot learning with two task-specific fine-tuned BERT models, and we compare the results against those of the top-performing teams at the OffensEval competitions. Our results show that while some LMMs such as Flan-T5 achieve competitive performance, in general LLMs lag behind the best OffensEval systems.",

keywords = "Machine learning, Text classification",

author = "Marcos Zampieri and Sara Rosenthal and Preslav Nakov and Alphaeus Dmonte and Tharindu Ranasinghe",

note = "{\textcopyright} The Author(s), 2023. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited ",

year = "2023",

month = dec,

day = "6",

doi = "10.1017/S1351324923000517",

language = "English",

volume = "29",

pages = "1416 -- 1435",

journal = "Natural Language Engineering",

issn = "1351-3249",

publisher = "Cambridge University Press",

number = "6",

}

TY - JOUR

T1 - OffensEval 2023: Offensive language identification in the age of Large Language Models

AU - Zampieri, Marcos

AU - Rosenthal, Sara

AU - Nakov, Preslav

AU - Dmonte, Alphaeus

AU - Ranasinghe, Tharindu

N1 - © The Author(s), 2023. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited

PY - 2023/12/6

Y1 - 2023/12/6

N2 - The OffensEval shared tasks organized as part of SemEval-2019–2020 were very popular, attracting over 1300 participating teams. The two editions of the shared task helped advance the state of the art in offensive language identification by providing the community with benchmark datasets in Arabic, Danish, English, Greek, and Turkish. The datasets were annotated using the OLID hierarchical taxonomy, which since then has become the de facto standard in general offensive language identification research and was widely used beyond OffensEval. We present a survey of OffensEval and related competitions, and we discuss the main lessons learned. We further evaluate the performance of Large Language Models (LLMs), which have recently revolutionalized the field of Natural Language Processing. We use zero-shot prompting with six popular LLMs and zero-shot learning with two task-specific fine-tuned BERT models, and we compare the results against those of the top-performing teams at the OffensEval competitions. Our results show that while some LMMs such as Flan-T5 achieve competitive performance, in general LLMs lag behind the best OffensEval systems.

AB - The OffensEval shared tasks organized as part of SemEval-2019–2020 were very popular, attracting over 1300 participating teams. The two editions of the shared task helped advance the state of the art in offensive language identification by providing the community with benchmark datasets in Arabic, Danish, English, Greek, and Turkish. The datasets were annotated using the OLID hierarchical taxonomy, which since then has become the de facto standard in general offensive language identification research and was widely used beyond OffensEval. We present a survey of OffensEval and related competitions, and we discuss the main lessons learned. We further evaluate the performance of Large Language Models (LLMs), which have recently revolutionalized the field of Natural Language Processing. We use zero-shot prompting with six popular LLMs and zero-shot learning with two task-specific fine-tuned BERT models, and we compare the results against those of the top-performing teams at the OffensEval competitions. Our results show that while some LMMs such as Flan-T5 achieve competitive performance, in general LLMs lag behind the best OffensEval systems.

KW - Machine learning

KW - Text classification

UR - https://www.cambridge.org/core/journals/natural-language-engineering/article/offenseval-2023-offensive-language-identification-in-the-age-of-large-language-models/2605A4C9E45354D36C0B732B49DB8CA3

UR - http://www.scopus.com/inward/record.url?scp=85179763014&partnerID=8YFLogxK

U2 - 10.1017/S1351324923000517

DO - 10.1017/S1351324923000517

M3 - Review article

SN - 1351-3249

VL - 29

SP - 1416

EP - 1435

JO - Natural Language Engineering

JF - Natural Language Engineering

IS - 6

ER -

OffensEval 2023: Offensive language identification in the age of Large Language Models

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this