Quantifying evidence in forensic authorship analysis

Tim Grant

doi:10.1558/ijsll.v14i1.1

Quantifying evidence in forensic authorship analysis

Tim Grant^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

The judicial interest in 'scientific' evidence has driven recent work to quantify results for forensic linguistic authorship analysis. Through a methodological discussion and a worked example this paper examines the issues which complicate attempts to quantify results in work. The solution suggested to some of the difficulties is a sampling and testing strategy which helps to identify potentially useful, valid and reliable markers of authorship. An important feature of the sampling strategy is that these markers identified as being generally valid and reliable are retested for use in specific authorship analysis cases. The suggested approach for drawing quantified conclusions combines discriminant function analysis and Bayesian likelihood measures. The worked example starts with twenty comparison texts for each of three potential authors and then uses a progressively smaller comparison corpus, reducing to fifteen, ten, five and finally three texts per author. This worked example demonstrates how reducing the amount of data affects the way conclusions can be drawn. With greater numbers of reference texts quantified and safe attributions are shown to be possible, but as the number of reference texts reduces the analysis shows how the conclusion which should be reached is that no attribution can be made. The testing process at no point results in instances of a misattribution.

Original language	English
Pages (from-to)	1-25
Number of pages	25
Journal	International Journal of Speech, Language and the Law
Volume	14
Issue number	1
DOIs	https://doi.org/10.1558/ijsll.v14i1.1
Publication status	Published - 15 Oct 2007

Keywords

Authorship analysis
Bayes theorem
Discriminant analysis
Error
Forensic linguistics
Sampling

Access to Document

10.1558/ijsll.v14i1.1

Cite this

@article{5a894beaa1eb4ce7ac03e6334fa301f7,

title = "Quantifying evidence in forensic authorship analysis",

abstract = "The judicial interest in 'scientific' evidence has driven recent work to quantify results for forensic linguistic authorship analysis. Through a methodological discussion and a worked example this paper examines the issues which complicate attempts to quantify results in work. The solution suggested to some of the difficulties is a sampling and testing strategy which helps to identify potentially useful, valid and reliable markers of authorship. An important feature of the sampling strategy is that these markers identified as being generally valid and reliable are retested for use in specific authorship analysis cases. The suggested approach for drawing quantified conclusions combines discriminant function analysis and Bayesian likelihood measures. The worked example starts with twenty comparison texts for each of three potential authors and then uses a progressively smaller comparison corpus, reducing to fifteen, ten, five and finally three texts per author. This worked example demonstrates how reducing the amount of data affects the way conclusions can be drawn. With greater numbers of reference texts quantified and safe attributions are shown to be possible, but as the number of reference texts reduces the analysis shows how the conclusion which should be reached is that no attribution can be made. The testing process at no point results in instances of a misattribution.",

keywords = "Authorship analysis, Bayes theorem, Discriminant analysis, Error, Forensic linguistics, Sampling",

author = "Tim Grant",

year = "2007",

month = oct,

day = "15",

doi = "10.1558/ijsll.v14i1.1",

language = "English",

volume = "14",

pages = "1--25",

journal = "International Journal of Speech, Language and the Law",

issn = "1748-8885",

publisher = "Equinox Publishing Ltd",

number = "1",

}

TY - JOUR

T1 - Quantifying evidence in forensic authorship analysis

AU - Grant, Tim

PY - 2007/10/15

Y1 - 2007/10/15

N2 - The judicial interest in 'scientific' evidence has driven recent work to quantify results for forensic linguistic authorship analysis. Through a methodological discussion and a worked example this paper examines the issues which complicate attempts to quantify results in work. The solution suggested to some of the difficulties is a sampling and testing strategy which helps to identify potentially useful, valid and reliable markers of authorship. An important feature of the sampling strategy is that these markers identified as being generally valid and reliable are retested for use in specific authorship analysis cases. The suggested approach for drawing quantified conclusions combines discriminant function analysis and Bayesian likelihood measures. The worked example starts with twenty comparison texts for each of three potential authors and then uses a progressively smaller comparison corpus, reducing to fifteen, ten, five and finally three texts per author. This worked example demonstrates how reducing the amount of data affects the way conclusions can be drawn. With greater numbers of reference texts quantified and safe attributions are shown to be possible, but as the number of reference texts reduces the analysis shows how the conclusion which should be reached is that no attribution can be made. The testing process at no point results in instances of a misattribution.

AB - The judicial interest in 'scientific' evidence has driven recent work to quantify results for forensic linguistic authorship analysis. Through a methodological discussion and a worked example this paper examines the issues which complicate attempts to quantify results in work. The solution suggested to some of the difficulties is a sampling and testing strategy which helps to identify potentially useful, valid and reliable markers of authorship. An important feature of the sampling strategy is that these markers identified as being generally valid and reliable are retested for use in specific authorship analysis cases. The suggested approach for drawing quantified conclusions combines discriminant function analysis and Bayesian likelihood measures. The worked example starts with twenty comparison texts for each of three potential authors and then uses a progressively smaller comparison corpus, reducing to fifteen, ten, five and finally three texts per author. This worked example demonstrates how reducing the amount of data affects the way conclusions can be drawn. With greater numbers of reference texts quantified and safe attributions are shown to be possible, but as the number of reference texts reduces the analysis shows how the conclusion which should be reached is that no attribution can be made. The testing process at no point results in instances of a misattribution.

KW - Authorship analysis

KW - Bayes theorem

KW - Discriminant analysis

KW - Error

KW - Forensic linguistics

KW - Sampling

UR - http://www.scopus.com/inward/record.url?scp=35148900949&partnerID=8YFLogxK

U2 - 10.1558/ijsll.v14i1.1

DO - 10.1558/ijsll.v14i1.1

M3 - Article

AN - SCOPUS:35148900949

SN - 1748-8885

VL - 14

SP - 1

EP - 25

JO - International Journal of Speech, Language and the Law

JF - International Journal of Speech, Language and the Law

IS - 1

ER -

Quantifying evidence in forensic authorship analysis

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this