Abstract
The basic assumption of quantitative authorship attribution is that the author of a text can be selected from a set of possible authors by comparing the values of textual measurements in that text to their corresponding values in each possible author's writing sample. Over the past three centuries, many types of textual measurements have been proposed, but never before have the majority of these measurements been tested on the same dataset. A large-scale comparison of textual measurements is crucial if current techniques are to be used effectively and if new and more powerful techniques are to be developed. This article presents the results of a comparison of thirty-nine different types of textual measurements commonly used in attribution studies, in order to determine which are the best indicators of authorship. Based on the results of these tests, a more accurate approach to quantitative authorship attribution is proposed, which involves the analysis of many different textual measurements.
Original language | English |
---|---|
Pages (from-to) | 251-270 |
Number of pages | 20 |
Journal | Literary and Linguistic Computing |
Volume | 22 |
Issue number | 3 |
Early online date | 26 Jul 2007 |
DOIs | |
Publication status | Published - Sept 2007 |