Abstract
The widespread presence of offensive content is a major issue in social media. This has motivated the development of computational models to identify such content in posts or conversations. Most of these models, however, treat offensive language identification as an isolated task. Very recently, a few datasets have been annotated with post-level offensiveness and related phenomena, such as offensive tokens, humor, engaging content, etc., creating the opportunity of modeling related tasks jointly which will help improve the explainability of offensive language detection systems and potentially aid human moderators. This study proposes a novel multi-task learning (MTL) architecture that can predict: (1) offensiveness at both post and token levels in English; and (2) offensiveness and related subjective tasks such as humor, engaging content, and gender bias identification in multilingual settings. Our results show that the proposed multi-task learning architecture outperforms current state-of-the-art methods trained to identify offense at the post level. We further demonstrate that MTL outperforms single-task learning (STL) across different tasks and language combinations.
Original language | English |
---|---|
Pages (from-to) | 613-630 |
Journal | Journal of Intelligent Information Systems |
Volume | 60 |
Issue number | 3 |
Early online date | 29 Apr 2023 |
DOIs | |
Publication status | Published - Jun 2023 |
Bibliographical note
Copyright © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at:https://doi.org/10.1007/s10844-023-00787-z
Keywords
- Deep learning
- Multi-task learning
- Offensive language identification
- Transformers