The intelligibility of noise-vocoded speech: spectral information available from across-channel comparison of amplitude envelopes

Brian Roberts, Robert J. Summers, Peter J. Bailey

Research output: Contribution to journalArticle

Abstract

Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output—a key source of phonetic detail—from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (=30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (˜N1 + N2), F2 (˜N3 + N4) and the higher formants (F3' ˜ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.
Original languageEnglish
Pages (from-to)1595-1600
Number of pages6
JournalProceeding of the Royal Society: Series B
Volume278
Issue number1711
Early online date10 Nov 2010
DOIs
Publication statusPublished - 22 May 2011

Fingerprint

Speech intelligibility
Noise
Speech analysis
Phonetics
Cues
Conveying
Frequency bands
speech
comparison

Bibliographical note


© 2010 The Royal Society. The intelligibility of noise-vocoded speech: spectral information available from across-channel comparison of amplitude envelopes. Brian Roberts, Robert J. Summers, Peter J. Bailey.
Published 10 November 2010.DOI: 10.1098/rspb.2010.1554

Keywords

  • noise-vocoded speech
  • spectral cues
  • formant frequencies
  • intelligibility

Cite this

@article{bff3fe1ab33749e3af489b656210bf18,
title = "The intelligibility of noise-vocoded speech: spectral information available from across-channel comparison of amplitude envelopes",
abstract = "Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output—a key source of phonetic detail—from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (=30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (˜N1 + N2), F2 (˜N3 + N4) and the higher formants (F3' ˜ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.",
keywords = "noise-vocoded speech, spectral cues, formant frequencies, intelligibility",
author = "Brian Roberts and Summers, {Robert J.} and Bailey, {Peter J.}",
note = "{\circledC} 2010 The Royal Society. The intelligibility of noise-vocoded speech: spectral information available from across-channel comparison of amplitude envelopes. Brian Roberts, Robert J. Summers, Peter J. Bailey. Published 10 November 2010.DOI: 10.1098/rspb.2010.1554",
year = "2011",
month = "5",
day = "22",
doi = "10.1098/rspb.2010.1554",
language = "English",
volume = "278",
pages = "1595--1600",
journal = "Proceeding of the Royal Society: Series B",
issn = "0962-8452",
publisher = "The Royal Society",
number = "1711",

}

The intelligibility of noise-vocoded speech : spectral information available from across-channel comparison of amplitude envelopes. / Roberts, Brian; Summers, Robert J.; Bailey, Peter J.

In: Proceeding of the Royal Society: Series B, Vol. 278, No. 1711, 22.05.2011, p. 1595-1600.

Research output: Contribution to journalArticle

TY - JOUR

T1 - The intelligibility of noise-vocoded speech

T2 - spectral information available from across-channel comparison of amplitude envelopes

AU - Roberts, Brian

AU - Summers, Robert J.

AU - Bailey, Peter J.

N1 - © 2010 The Royal Society. The intelligibility of noise-vocoded speech: spectral information available from across-channel comparison of amplitude envelopes. Brian Roberts, Robert J. Summers, Peter J. Bailey. Published 10 November 2010.DOI: 10.1098/rspb.2010.1554

PY - 2011/5/22

Y1 - 2011/5/22

N2 - Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output—a key source of phonetic detail—from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (=30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (˜N1 + N2), F2 (˜N3 + N4) and the higher formants (F3' ˜ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.

AB - Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output—a key source of phonetic detail—from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (=30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (˜N1 + N2), F2 (˜N3 + N4) and the higher formants (F3' ˜ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.

KW - noise-vocoded speech

KW - spectral cues

KW - formant frequencies

KW - intelligibility

UR - http://www.scopus.com/inward/record.url?scp=79954547351&partnerID=8YFLogxK

UR - http://rspb.royalsocietypublishing.org/content/278/1711/1595

U2 - 10.1098/rspb.2010.1554

DO - 10.1098/rspb.2010.1554

M3 - Article

VL - 278

SP - 1595

EP - 1600

JO - Proceeding of the Royal Society: Series B

JF - Proceeding of the Royal Society: Series B

SN - 0962-8452

IS - 1711

ER -