Native language influence detection for forensic authorship analysis: Identifying L1 persian bloggers

Ria Perkins; Tim Grant

doi:10.1558/ijsll.30844

Native language influence detection for forensic authorship analysis: Identifying L1 persian bloggers

Research output: Contribution to journal › Article › peer-review

Abstract

This article demonstrates and examines the potential use of interlingual identifiers for forensic authorship analysis and native language influence detection (NLID). The work focuses on the practical applications of native language (L1) identifiers by a human analyst in investigative situations. Using naturally occurring blog posts where the writer self-identifies as a native Persian speaker, a human analyst derived and coded sets of non-native features. Two logistic regression models were built: the first was used to select features to distinguish L1 Persian speakers from L1 English speakers in their English writings, the second developed a feature list to contrast L1 languages that are geographically and linguistically close to Persian. The results clearly demonstrate that interlingual identifiers have the potential to aid in determining the L1 of an anonymous author and can be used by a human analyst in a short forensically realistic example text. This article demonstrates that NLID is possible beyond the more common computational approaches and can form a useful tool in the forensic linguist’s toolbox. This study is not a statistical validation study; instead it demonstrates how a sociolinguistic approach can complement more traditional computational approaches.

Original language	English
Pages (from-to)	1-20
Number of pages	20
Journal	International Journal of Speech, Language and the Law
Volume	25
Issue number	1
DOIs	https://doi.org/10.1558/ijsll.30844
Publication status	Published - 10 Sept 2018

Bibliographical note

Keywords

Authorship analysis
Linguistic profiling
Native language identification
Native language influence detection
Persian

Access to Document

10.1558/ijsll.30844

Perkins and Grant NLID persian final edits (1)
©2018, equinox publishing
Accepted author manuscript, 70.4 KB

Cite this

@article{119324c5576f4e0284070ee63353a054,

title = "Native language influence detection for forensic authorship analysis: Identifying L1 persian bloggers",

abstract = "This article demonstrates and examines the potential use of interlingual identifiers for forensic authorship analysis and native language influence detection (NLID). The work focuses on the practical applications of native language (L1) identifiers by a human analyst in investigative situations. Using naturally occurring blog posts where the writer self-identifies as a native Persian speaker, a human analyst derived and coded sets of non-native features. Two logistic regression models were built: the first was used to select features to distinguish L1 Persian speakers from L1 English speakers in their English writings, the second developed a feature list to contrast L1 languages that are geographically and linguistically close to Persian. The results clearly demonstrate that interlingual identifiers have the potential to aid in determining the L1 of an anonymous author and can be used by a human analyst in a short forensically realistic example text. This article demonstrates that NLID is possible beyond the more common computational approaches and can form a useful tool in the forensic linguist{\textquoteright}s toolbox. This study is not a statistical validation study; instead it demonstrates how a sociolinguistic approach can complement more traditional computational approaches.",

keywords = "Authorship analysis, Linguistic profiling, Native language identification, Native language influence detection, Persian",

author = "Ria Perkins and Tim Grant",

note = "{\textcopyright}2018, equinox publishing",

year = "2018",

month = sep,

day = "10",

doi = "10.1558/ijsll.30844",

language = "English",

volume = "25",

pages = "1--20",

journal = "International Journal of Speech, Language and the Law",

issn = "1748-8885",

publisher = "Equinox Publishing Ltd",

number = "1",

}

TY - JOUR

T1 - Native language influence detection for forensic authorship analysis

T2 - Identifying L1 persian bloggers

AU - Perkins, Ria

AU - Grant, Tim

PY - 2018/9/10

Y1 - 2018/9/10

N2 - This article demonstrates and examines the potential use of interlingual identifiers for forensic authorship analysis and native language influence detection (NLID). The work focuses on the practical applications of native language (L1) identifiers by a human analyst in investigative situations. Using naturally occurring blog posts where the writer self-identifies as a native Persian speaker, a human analyst derived and coded sets of non-native features. Two logistic regression models were built: the first was used to select features to distinguish L1 Persian speakers from L1 English speakers in their English writings, the second developed a feature list to contrast L1 languages that are geographically and linguistically close to Persian. The results clearly demonstrate that interlingual identifiers have the potential to aid in determining the L1 of an anonymous author and can be used by a human analyst in a short forensically realistic example text. This article demonstrates that NLID is possible beyond the more common computational approaches and can form a useful tool in the forensic linguist’s toolbox. This study is not a statistical validation study; instead it demonstrates how a sociolinguistic approach can complement more traditional computational approaches.

AB - This article demonstrates and examines the potential use of interlingual identifiers for forensic authorship analysis and native language influence detection (NLID). The work focuses on the practical applications of native language (L1) identifiers by a human analyst in investigative situations. Using naturally occurring blog posts where the writer self-identifies as a native Persian speaker, a human analyst derived and coded sets of non-native features. Two logistic regression models were built: the first was used to select features to distinguish L1 Persian speakers from L1 English speakers in their English writings, the second developed a feature list to contrast L1 languages that are geographically and linguistically close to Persian. The results clearly demonstrate that interlingual identifiers have the potential to aid in determining the L1 of an anonymous author and can be used by a human analyst in a short forensically realistic example text. This article demonstrates that NLID is possible beyond the more common computational approaches and can form a useful tool in the forensic linguist’s toolbox. This study is not a statistical validation study; instead it demonstrates how a sociolinguistic approach can complement more traditional computational approaches.

KW - Authorship analysis

KW - Linguistic profiling

KW - Native language identification

KW - Native language influence detection

KW - Persian

UR - http://www.scopus.com/inward/record.url?scp=85053307409&partnerID=8YFLogxK

U2 - 10.1558/ijsll.30844

DO - 10.1558/ijsll.30844

M3 - Article

AN - SCOPUS:85053307409

SN - 1748-8885

VL - 25

SP - 1

EP - 20

JO - International Journal of Speech, Language and the Law

JF - International Journal of Speech, Language and the Law

IS - 1

ER -

Native language influence detection for forensic authorship analysis: Identifying L1 persian bloggers

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this