Abstract
The current investigation addresses a vital lacuna in forensic authorship studies, and more concretely, in Native Language Influence Detection (NLID) research: narrowing down a speaker’s native dialect instead of only their native language (L1), which might not be enough when carrying out sociolinguistic profiling tasks. Native Dialect Influence Detection (NDID), the focus of our study, can thus greatly aid at the investigative level. We approach this topic by providing a comprehensive analysis of linguistic features that serve to identify two non-contact dialects of L1 Spanish (i.e., Mexican and Peninsular varieties) when dealing with data written in L2 English, which come from Tripadvisor. Our main aim is to investigate if an author’s L2 features can point to their L1 native dialect, rather than only to their native language. Findings point to L1 dialectal transfer of punctuation signs, adjectives of affect, and intensifiers: these linguistic features, even when expressed in an L2, show a culturally bound use. Additionally, we implemented an automatic classifier that achieved an accuracy of 69% in categorizing test data, using only linguistic features that have explanatory power and can aid linguistic theory. This is key for explainability in the forensic context, which Native Language Identification (NLI) studies tend to neglect (Kingston, 2019). Results showed that L1 Spanish dialects can be differentiated by analyzing L2 English text, pointing to NDID as a fertile approach for narrowing down candidate L1 dialects of a language when analyzing L2 data.
Original language | English |
---|---|
Pages (from-to) | 120-145 |
Number of pages | 26 |
Journal | Language and Law/Linguagem e Direito |
Volume | 9 |
Issue number | 1 |
DOIs | |
Publication status | Published - 23 Nov 2022 |
Bibliographical note
Copyright © 2022 Andrea Mojedano Batel, Mitchell Abrams, Piotr Pęzik. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/).Keywords
- Native dialect influence detection
- Native language influence detection
- Authorship analysis
- Language variety identification
- Spanish civil war