TY - JOUR
T1 - Likelihood ratio calculation for a disputed-utterance analysis with limited available data
AU - Morrison, Geoffrey Stewart
AU - Lindh, Jonas
AU - Curran, James M.
PY - 2014/3
Y1 - 2014/3
N2 - We present a disputed-utterance analysis using relevant data, quantitative measurements and statistical models to calculate likelihood ratios. The acoustic data were taken from an actual forensic case in which the amount of data available to train the statistical models was small and the data point from the disputed word was far out on the tail of one of the modelled distributions. A procedure based on single multivariate Gaussian models for each hypothesis led to an unrealistically high likelihood ratio value with extremely poor reliability, but a procedure based on Hotelling's T2 statistic and a procedure based on calculating a posterior predictive density produced more acceptable results. The Hotelling's T2 procedure attempts to take account of the sampling uncertainty of the mean vectors and covariance matrices due to the small number of tokens used to train the models, and the posterior-predictive-density analysis integrates out the values of the mean vectors and covariance matrices as nuisance parameters. Data scarcity is common in forensic speech science and we argue that it is important not to accept extremely large calculated likelihood ratios at face value, but to consider whether such values can be supported given the size of the available data and modelling constraints.
AB - We present a disputed-utterance analysis using relevant data, quantitative measurements and statistical models to calculate likelihood ratios. The acoustic data were taken from an actual forensic case in which the amount of data available to train the statistical models was small and the data point from the disputed word was far out on the tail of one of the modelled distributions. A procedure based on single multivariate Gaussian models for each hypothesis led to an unrealistically high likelihood ratio value with extremely poor reliability, but a procedure based on Hotelling's T2 statistic and a procedure based on calculating a posterior predictive density produced more acceptable results. The Hotelling's T2 procedure attempts to take account of the sampling uncertainty of the mean vectors and covariance matrices due to the small number of tokens used to train the models, and the posterior-predictive-density analysis integrates out the values of the mean vectors and covariance matrices as nuisance parameters. Data scarcity is common in forensic speech science and we argue that it is important not to accept extremely large calculated likelihood ratios at face value, but to consider whether such values can be supported given the size of the available data and modelling constraints.
KW - Disputed utterance
KW - Forensic
KW - Hotelling's T
KW - Keywords
KW - Likelihood ratio
KW - Posterior predictive density
KW - Reliability
UR - http://www.scopus.com/inward/record.url?scp=84890288748&partnerID=8YFLogxK
UR - https://www.sciencedirect.com/science/article/pii/S0167639313001635
U2 - 10.1016/j.specom.2013.11.004
DO - 10.1016/j.specom.2013.11.004
M3 - Article
AN - SCOPUS:84890288748
VL - 58
SP - 81
EP - 90
JO - Speech Communication
JF - Speech Communication
SN - 0167-6393
ER -