Quantitative authorship attribution: an evaluation of techniques

Jack Grieve

doi:10.1093/llc/fqm020

Quantitative authorship attribution: an evaluation of techniques

Jack Grieve

Aston Institute for Forensic Linguistics

Research output: Contribution to journal › Article › peer-review

Abstract

The basic assumption of quantitative authorship attribution is that the author of a text can be selected from a set of possible authors by comparing the values of textual measurements in that text to their corresponding values in each possible author's writing sample. Over the past three centuries, many types of textual measurements have been proposed, but never before have the majority of these measurements been tested on the same dataset. A large-scale comparison of textual measurements is crucial if current techniques are to be used effectively and if new and more powerful techniques are to be developed. This article presents the results of a comparison of thirty-nine different types of textual measurements commonly used in attribution studies, in order to determine which are the best indicators of authorship. Based on the results of these tests, a more accurate approach to quantitative authorship attribution is proposed, which involves the analysis of many different textual measurements.

Original language	English
Pages (from-to)	251-270
Number of pages	20
Journal	Literary and Linguistic Computing
Volume	22
Issue number	3
Early online date	26 Jul 2007
DOIs	https://doi.org/10.1093/llc/fqm020
Publication status	Published - Sept 2007

Access to Document

10.1093/llc/fqm020

Cite this

@article{c9659c07dd594f36beb41146dd2186cb,

title = "Quantitative authorship attribution: an evaluation of techniques",

abstract = "The basic assumption of quantitative authorship attribution is that the author of a text can be selected from a set of possible authors by comparing the values of textual measurements in that text to their corresponding values in each possible author's writing sample. Over the past three centuries, many types of textual measurements have been proposed, but never before have the majority of these measurements been tested on the same dataset. A large-scale comparison of textual measurements is crucial if current techniques are to be used effectively and if new and more powerful techniques are to be developed. This article presents the results of a comparison of thirty-nine different types of textual measurements commonly used in attribution studies, in order to determine which are the best indicators of authorship. Based on the results of these tests, a more accurate approach to quantitative authorship attribution is proposed, which involves the analysis of many different textual measurements. ",

author = "Jack Grieve",

year = "2007",

month = sep,

doi = "10.1093/llc/fqm020",

language = "English",

volume = "22",

pages = "251--270",

journal = "Literary and Linguistic Computing",

issn = "1477-4615",

publisher = "Oxford University Press",

number = "3",

}

TY - JOUR

T1 - Quantitative authorship attribution

T2 - an evaluation of techniques

AU - Grieve, Jack

PY - 2007/9

Y1 - 2007/9

N2 - The basic assumption of quantitative authorship attribution is that the author of a text can be selected from a set of possible authors by comparing the values of textual measurements in that text to their corresponding values in each possible author's writing sample. Over the past three centuries, many types of textual measurements have been proposed, but never before have the majority of these measurements been tested on the same dataset. A large-scale comparison of textual measurements is crucial if current techniques are to be used effectively and if new and more powerful techniques are to be developed. This article presents the results of a comparison of thirty-nine different types of textual measurements commonly used in attribution studies, in order to determine which are the best indicators of authorship. Based on the results of these tests, a more accurate approach to quantitative authorship attribution is proposed, which involves the analysis of many different textual measurements.

AB - The basic assumption of quantitative authorship attribution is that the author of a text can be selected from a set of possible authors by comparing the values of textual measurements in that text to their corresponding values in each possible author's writing sample. Over the past three centuries, many types of textual measurements have been proposed, but never before have the majority of these measurements been tested on the same dataset. A large-scale comparison of textual measurements is crucial if current techniques are to be used effectively and if new and more powerful techniques are to be developed. This article presents the results of a comparison of thirty-nine different types of textual measurements commonly used in attribution studies, in order to determine which are the best indicators of authorship. Based on the results of these tests, a more accurate approach to quantitative authorship attribution is proposed, which involves the analysis of many different textual measurements.

UR - http://llc.oxfordjournals.org/content/22/3/251

U2 - 10.1093/llc/fqm020

DO - 10.1093/llc/fqm020

M3 - Article

SN - 1477-4615

VL - 22

SP - 251

EP - 270

JO - Literary and Linguistic Computing

JF - Literary and Linguistic Computing

IS - 3

ER -

Quantitative authorship attribution: an evaluation of techniques

Abstract

Access to Document

Other files and links

Fingerprint

Cite this