Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation: performance of human listeners and of computational models based on autocorrelation

Brian Roberts, Stephen D. Holmes, Christopher J. Darwin, Guy J. Brown

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Keyword identification in one of two simultaneous sentences is improved when the sentences differ in F0, particularly when they are almost continuously voiced. Sentences of this kind were recorded, monotonised using PSOLA, and re-synthesised to give a range of harmonic ?F0s (0, 1, 3, and 10 semitones). They were additionally re-synthesised by LPC with the LPC residual frequency shifted by 25% of F0, to give excitation with inharmonic but regularly spaced components. Perceptual identification of frequency-shifted sentences showed a similar large improvement with nominal ?F0 as seen for harmonic sentences, although overall performance was about 10% poorer. We compared performance with that of two autocorrelation-based computational models comprising four stages: (i) peripheral frequency selectivity and half-wave rectification; (ii) within-channel periodicity extraction; (iii) identification of the two major peaks in the summary autocorrelation function (SACF); (iv) a template-based approach to speech recognition using dynamic time warping. One model sampled the correlogram at the target-F0 period and performed spectral matching; the other deselected channels dominated by the interferer and performed matching on the short-lag portion of the residual SACF. Both models reproduced the monotonic increase observed in human performance with increasing ?F0 for the harmonic stimuli, but not for the frequency-shifted stimuli. A revised version of the spectral-matching model, which groups patterns of periodicity that lie on a curve in the frequency-delay plane, showed a closer match to the perceptual data for frequency-shifted sentences. The results extend the range of phenomena originally attributed to harmonic processing to grouping by common spectral pattern.
Original languageEnglish
Title of host publicationThe neurophysiological nases of auditory perception
EditorsEnrique A. Lopez-Poveda, Alan R. Palmer, Ray Meddis
Place of PublicationNew York (US)
PublisherSpringer
Pages521-531
Number of pages11
ISBN (Print)978-1-4419-5685-9
Publication statusPublished - 2010

Fingerprint Dive into the research topics of 'Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation: performance of human listeners and of computational models based on autocorrelation'. Together they form a unique fingerprint.

  • Cite this

    Roberts, B., Holmes, S. D., Darwin, C. J., & Brown, G. J. (2010). Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation: performance of human listeners and of computational models based on autocorrelation. In E. A. Lopez-Poveda, A. R. Palmer, & R. Meddis (Eds.), The neurophysiological nases of auditory perception (pp. 521-531). Springer.