Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation: performance of human listeners and of computational models based on autocorrelation

Brian Roberts, Stephen D. Holmes, Christopher J. Darwin, Guy J. Brown

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Keyword identification in one of two simultaneous sentences is improved when the sentences differ in F0, particularly when they are almost continuously voiced. Sentences of this kind were recorded, monotonised using PSOLA, and re-synthesised to give a range of harmonic ?F0s (0, 1, 3, and 10 semitones). They were additionally re-synthesised by LPC with the LPC residual frequency shifted by 25% of F0, to give excitation with inharmonic but regularly spaced components. Perceptual identification of frequency-shifted sentences showed a similar large improvement with nominal ?F0 as seen for harmonic sentences, although overall performance was about 10% poorer. We compared performance with that of two autocorrelation-based computational models comprising four stages: (i) peripheral frequency selectivity and half-wave rectification; (ii) within-channel periodicity extraction; (iii) identification of the two major peaks in the summary autocorrelation function (SACF); (iv) a template-based approach to speech recognition using dynamic time warping. One model sampled the correlogram at the target-F0 period and performed spectral matching; the other deselected channels dominated by the interferer and performed matching on the short-lag portion of the residual SACF. Both models reproduced the monotonic increase observed in human performance with increasing ?F0 for the harmonic stimuli, but not for the frequency-shifted stimuli. A revised version of the spectral-matching model, which groups patterns of periodicity that lie on a curve in the frequency-delay plane, showed a closer match to the perceptual data for frequency-shifted sentences. The results extend the range of phenomena originally attributed to harmonic processing to grouping by common spectral pattern.
Original languageEnglish
Title of host publicationThe neurophysiological nases of auditory perception
EditorsEnrique A. Lopez-Poveda, Alan R. Palmer, Ray Meddis
Place of PublicationNew York (US)
PublisherSpringer
Pages521-531
Number of pages11
ISBN (Print)978-1-4419-5685-9
Publication statusPublished - 2010

Fingerprint

Listeners
Autocorrelation
Computational Model
Harmonics
Spectrality
Summary
Stimulus
Rectification
Speech Recognition
Grouping
Waves
Semitone
Key Words
Template

Cite this

Roberts, B., Holmes, S. D., Darwin, C. J., & Brown, G. J. (2010). Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation: performance of human listeners and of computational models based on autocorrelation. In E. A. Lopez-Poveda, A. R. Palmer, & R. Meddis (Eds.), The neurophysiological nases of auditory perception (pp. 521-531). New York (US): Springer.
Roberts, Brian ; Holmes, Stephen D. ; Darwin, Christopher J. ; Brown, Guy J. / Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation : performance of human listeners and of computational models based on autocorrelation. The neurophysiological nases of auditory perception. editor / Enrique A. Lopez-Poveda ; Alan R. Palmer ; Ray Meddis. New York (US) : Springer, 2010. pp. 521-531
@inbook{d1e214ac4c9747b0ad18f7a9ef3363ba,
title = "Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation: performance of human listeners and of computational models based on autocorrelation",
abstract = "Keyword identification in one of two simultaneous sentences is improved when the sentences differ in F0, particularly when they are almost continuously voiced. Sentences of this kind were recorded, monotonised using PSOLA, and re-synthesised to give a range of harmonic ?F0s (0, 1, 3, and 10 semitones). They were additionally re-synthesised by LPC with the LPC residual frequency shifted by 25{\%} of F0, to give excitation with inharmonic but regularly spaced components. Perceptual identification of frequency-shifted sentences showed a similar large improvement with nominal ?F0 as seen for harmonic sentences, although overall performance was about 10{\%} poorer. We compared performance with that of two autocorrelation-based computational models comprising four stages: (i) peripheral frequency selectivity and half-wave rectification; (ii) within-channel periodicity extraction; (iii) identification of the two major peaks in the summary autocorrelation function (SACF); (iv) a template-based approach to speech recognition using dynamic time warping. One model sampled the correlogram at the target-F0 period and performed spectral matching; the other deselected channels dominated by the interferer and performed matching on the short-lag portion of the residual SACF. Both models reproduced the monotonic increase observed in human performance with increasing ?F0 for the harmonic stimuli, but not for the frequency-shifted stimuli. A revised version of the spectral-matching model, which groups patterns of periodicity that lie on a curve in the frequency-delay plane, showed a closer match to the perceptual data for frequency-shifted sentences. The results extend the range of phenomena originally attributed to harmonic processing to grouping by common spectral pattern.",
author = "Brian Roberts and Holmes, {Stephen D.} and Darwin, {Christopher J.} and Brown, {Guy J.}",
year = "2010",
language = "English",
isbn = "978-1-4419-5685-9",
pages = "521--531",
editor = "Lopez-Poveda, {Enrique A.} and Palmer, {Alan R.} and Ray Meddis",
booktitle = "The neurophysiological nases of auditory perception",
publisher = "Springer",
address = "Germany",

}

Roberts, B, Holmes, SD, Darwin, CJ & Brown, GJ 2010, Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation: performance of human listeners and of computational models based on autocorrelation. in EA Lopez-Poveda, AR Palmer & R Meddis (eds), The neurophysiological nases of auditory perception. Springer, New York (US), pp. 521-531.

Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation : performance of human listeners and of computational models based on autocorrelation. / Roberts, Brian; Holmes, Stephen D.; Darwin, Christopher J.; Brown, Guy J.

The neurophysiological nases of auditory perception. ed. / Enrique A. Lopez-Poveda; Alan R. Palmer; Ray Meddis. New York (US) : Springer, 2010. p. 521-531.

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation

T2 - performance of human listeners and of computational models based on autocorrelation

AU - Roberts, Brian

AU - Holmes, Stephen D.

AU - Darwin, Christopher J.

AU - Brown, Guy J.

PY - 2010

Y1 - 2010

N2 - Keyword identification in one of two simultaneous sentences is improved when the sentences differ in F0, particularly when they are almost continuously voiced. Sentences of this kind were recorded, monotonised using PSOLA, and re-synthesised to give a range of harmonic ?F0s (0, 1, 3, and 10 semitones). They were additionally re-synthesised by LPC with the LPC residual frequency shifted by 25% of F0, to give excitation with inharmonic but regularly spaced components. Perceptual identification of frequency-shifted sentences showed a similar large improvement with nominal ?F0 as seen for harmonic sentences, although overall performance was about 10% poorer. We compared performance with that of two autocorrelation-based computational models comprising four stages: (i) peripheral frequency selectivity and half-wave rectification; (ii) within-channel periodicity extraction; (iii) identification of the two major peaks in the summary autocorrelation function (SACF); (iv) a template-based approach to speech recognition using dynamic time warping. One model sampled the correlogram at the target-F0 period and performed spectral matching; the other deselected channels dominated by the interferer and performed matching on the short-lag portion of the residual SACF. Both models reproduced the monotonic increase observed in human performance with increasing ?F0 for the harmonic stimuli, but not for the frequency-shifted stimuli. A revised version of the spectral-matching model, which groups patterns of periodicity that lie on a curve in the frequency-delay plane, showed a closer match to the perceptual data for frequency-shifted sentences. The results extend the range of phenomena originally attributed to harmonic processing to grouping by common spectral pattern.

AB - Keyword identification in one of two simultaneous sentences is improved when the sentences differ in F0, particularly when they are almost continuously voiced. Sentences of this kind were recorded, monotonised using PSOLA, and re-synthesised to give a range of harmonic ?F0s (0, 1, 3, and 10 semitones). They were additionally re-synthesised by LPC with the LPC residual frequency shifted by 25% of F0, to give excitation with inharmonic but regularly spaced components. Perceptual identification of frequency-shifted sentences showed a similar large improvement with nominal ?F0 as seen for harmonic sentences, although overall performance was about 10% poorer. We compared performance with that of two autocorrelation-based computational models comprising four stages: (i) peripheral frequency selectivity and half-wave rectification; (ii) within-channel periodicity extraction; (iii) identification of the two major peaks in the summary autocorrelation function (SACF); (iv) a template-based approach to speech recognition using dynamic time warping. One model sampled the correlogram at the target-F0 period and performed spectral matching; the other deselected channels dominated by the interferer and performed matching on the short-lag portion of the residual SACF. Both models reproduced the monotonic increase observed in human performance with increasing ?F0 for the harmonic stimuli, but not for the frequency-shifted stimuli. A revised version of the spectral-matching model, which groups patterns of periodicity that lie on a curve in the frequency-delay plane, showed a closer match to the perceptual data for frequency-shifted sentences. The results extend the range of phenomena originally attributed to harmonic processing to grouping by common spectral pattern.

UR - http://rd.springer.com/chapter/10.1007/978-1-4419-5686-6_48

M3 - Chapter

SN - 978-1-4419-5685-9

SP - 521

EP - 531

BT - The neurophysiological nases of auditory perception

A2 - Lopez-Poveda, Enrique A.

A2 - Palmer, Alan R.

A2 - Meddis, Ray

PB - Springer

CY - New York (US)

ER -

Roberts B, Holmes SD, Darwin CJ, Brown GJ. Perception of concurrent sentences with harmonic or frequency-shifted voiced excitation: performance of human listeners and of computational models based on autocorrelation. In Lopez-Poveda EA, Palmer AR, Meddis R, editors, The neurophysiological nases of auditory perception. New York (US): Springer. 2010. p. 521-531