Keyword identification in one of two simultaneous sentences is improved when the sentences differ in F0, particularly when they are almost continuously voiced. Sentences of this kind were recorded, monotonised using PSOLA, and re-synthesised to give a range of harmonic ?F0s (0, 1, 3, and 10 semitones). They were additionally re-synthesised by LPC with the LPC residual frequency shifted by 25% of F0, to give excitation with inharmonic but regularly spaced components. Perceptual identification of frequency-shifted sentences showed a similar large improvement with nominal ?F0 as seen for harmonic sentences, although overall performance was about 10% poorer. We compared performance with that of two autocorrelation-based computational models comprising four stages: (i) peripheral frequency selectivity and half-wave rectification; (ii) within-channel periodicity extraction; (iii) identification of the two major peaks in the summary autocorrelation function (SACF); (iv) a template-based approach to speech recognition using dynamic time warping. One model sampled the correlogram at the target-F0 period and performed spectral matching; the other deselected channels dominated by the interferer and performed matching on the short-lag portion of the residual SACF. Both models reproduced the monotonic increase observed in human performance with increasing ?F0 for the harmonic stimuli, but not for the frequency-shifted stimuli. A revised version of the spectral-matching model, which groups patterns of periodicity that lie on a curve in the frequency-delay plane, showed a closer match to the perceptual data for frequency-shifted sentences. The results extend the range of phenomena originally attributed to harmonic processing to grouping by common spectral pattern.
|Title of host publication||The neurophysiological nases of auditory perception|
|Editors||Enrique A. Lopez-Poveda, Alan R. Palmer, Ray Meddis|
|Place of Publication||New York (US)|
|Number of pages||11|
|Publication status||Published - 2010|