Likelihood ratio calculation for a disputed-utterance analysis with limited available data

Geoffrey Stewart Morrison; Jonas Lindh; James M. Curran

doi:10.1016/j.specom.2013.11.004

Likelihood ratio calculation for a disputed-utterance analysis with limited available data

Geoffrey Stewart Morrison^*, Jonas Lindh, James M. Curran

^*Corresponding author for this work

School of Social Sciences and Humanities

Research output: Contribution to journal › Article › peer-review

Abstract

We present a disputed-utterance analysis using relevant data, quantitative measurements and statistical models to calculate likelihood ratios. The acoustic data were taken from an actual forensic case in which the amount of data available to train the statistical models was small and the data point from the disputed word was far out on the tail of one of the modelled distributions. A procedure based on single multivariate Gaussian models for each hypothesis led to an unrealistically high likelihood ratio value with extremely poor reliability, but a procedure based on Hotelling's T² statistic and a procedure based on calculating a posterior predictive density produced more acceptable results. The Hotelling's T² procedure attempts to take account of the sampling uncertainty of the mean vectors and covariance matrices due to the small number of tokens used to train the models, and the posterior-predictive-density analysis integrates out the values of the mean vectors and covariance matrices as nuisance parameters. Data scarcity is common in forensic speech science and we argue that it is important not to accept extremely large calculated likelihood ratios at face value, but to consider whether such values can be supported given the size of the available data and modelling constraints.

Original language	English
Pages (from-to)	81-90
Number of pages	10
Journal	Speech Communication
Volume	58
DOIs	https://doi.org/10.1016/j.specom.2013.11.004
Publication status	Published - Mar 2014

Keywords

Disputed utterance
Forensic
Hotelling's T
Keywords
Likelihood ratio
Posterior predictive density
Reliability

Access to Document

10.1016/j.specom.2013.11.004

Cite this

@article{decce735699543ebb17a802b60578999,

title = "Likelihood ratio calculation for a disputed-utterance analysis with limited available data",

abstract = "We present a disputed-utterance analysis using relevant data, quantitative measurements and statistical models to calculate likelihood ratios. The acoustic data were taken from an actual forensic case in which the amount of data available to train the statistical models was small and the data point from the disputed word was far out on the tail of one of the modelled distributions. A procedure based on single multivariate Gaussian models for each hypothesis led to an unrealistically high likelihood ratio value with extremely poor reliability, but a procedure based on Hotelling's T2 statistic and a procedure based on calculating a posterior predictive density produced more acceptable results. The Hotelling's T2 procedure attempts to take account of the sampling uncertainty of the mean vectors and covariance matrices due to the small number of tokens used to train the models, and the posterior-predictive-density analysis integrates out the values of the mean vectors and covariance matrices as nuisance parameters. Data scarcity is common in forensic speech science and we argue that it is important not to accept extremely large calculated likelihood ratios at face value, but to consider whether such values can be supported given the size of the available data and modelling constraints.",

keywords = "Disputed utterance, Forensic, Hotelling's T, Keywords, Likelihood ratio, Posterior predictive density, Reliability",

author = "Morrison, {Geoffrey Stewart} and Jonas Lindh and Curran, {James M.}",

year = "2014",

month = mar,

doi = "10.1016/j.specom.2013.11.004",

language = "English",

volume = "58",

pages = "81--90",

journal = "Speech Communication",

issn = "0167-6393",

publisher = "Elsevier",

}

TY - JOUR

T1 - Likelihood ratio calculation for a disputed-utterance analysis with limited available data

AU - Morrison, Geoffrey Stewart

AU - Lindh, Jonas

AU - Curran, James M.

PY - 2014/3

Y1 - 2014/3

N2 - We present a disputed-utterance analysis using relevant data, quantitative measurements and statistical models to calculate likelihood ratios. The acoustic data were taken from an actual forensic case in which the amount of data available to train the statistical models was small and the data point from the disputed word was far out on the tail of one of the modelled distributions. A procedure based on single multivariate Gaussian models for each hypothesis led to an unrealistically high likelihood ratio value with extremely poor reliability, but a procedure based on Hotelling's T2 statistic and a procedure based on calculating a posterior predictive density produced more acceptable results. The Hotelling's T2 procedure attempts to take account of the sampling uncertainty of the mean vectors and covariance matrices due to the small number of tokens used to train the models, and the posterior-predictive-density analysis integrates out the values of the mean vectors and covariance matrices as nuisance parameters. Data scarcity is common in forensic speech science and we argue that it is important not to accept extremely large calculated likelihood ratios at face value, but to consider whether such values can be supported given the size of the available data and modelling constraints.

AB - We present a disputed-utterance analysis using relevant data, quantitative measurements and statistical models to calculate likelihood ratios. The acoustic data were taken from an actual forensic case in which the amount of data available to train the statistical models was small and the data point from the disputed word was far out on the tail of one of the modelled distributions. A procedure based on single multivariate Gaussian models for each hypothesis led to an unrealistically high likelihood ratio value with extremely poor reliability, but a procedure based on Hotelling's T2 statistic and a procedure based on calculating a posterior predictive density produced more acceptable results. The Hotelling's T2 procedure attempts to take account of the sampling uncertainty of the mean vectors and covariance matrices due to the small number of tokens used to train the models, and the posterior-predictive-density analysis integrates out the values of the mean vectors and covariance matrices as nuisance parameters. Data scarcity is common in forensic speech science and we argue that it is important not to accept extremely large calculated likelihood ratios at face value, but to consider whether such values can be supported given the size of the available data and modelling constraints.

KW - Disputed utterance

KW - Forensic

KW - Hotelling's T

KW - Keywords

KW - Likelihood ratio

KW - Posterior predictive density

KW - Reliability

UR - http://www.scopus.com/inward/record.url?scp=84890288748&partnerID=8YFLogxK

UR - https://www.sciencedirect.com/science/article/pii/S0167639313001635

U2 - 10.1016/j.specom.2013.11.004

DO - 10.1016/j.specom.2013.11.004

M3 - Article

AN - SCOPUS:84890288748

SN - 0167-6393

VL - 58

SP - 81

EP - 90

JO - Speech Communication

JF - Speech Communication

ER -

Likelihood ratio calculation for a disputed-utterance analysis with limited available data

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this