A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices

Amir Hossein Poorjam; Max A Little; Jesper Rindom Jensen; Mads Græsbøll Christensen

doi:10.1109/ICASSP.2018.8462459

A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices

Amir Hossein Poorjam, Max A Little, Jesper Rindom Jensen, Mads Græsbøll Christensen

Research output: Chapter in Book/Published conference output › Conference publication

Abstract

The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.

Original language	English
Title of host publication	2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publisher	IEEE
Pages	296-300
ISBN (Electronic)	978-1-5386-4658-8
ISBN (Print)	978-1-5386-4659-5
DOIs	https://doi.org/10.1109/ICASSP.2018.8462459
Publication status	Published - 13 Sept 2018
Event	2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Calgary, Canada Duration: 15 Apr 2018 → 20 Apr 2018

Publication series

Name	2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publisher	IEEE
ISSN (Electronic)	2379-190X

Conference

Conference	2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Country/Territory	Canada
City	Calgary
Period	15/04/18 → 20/04/18

Bibliographical note

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Access to Document

10.1109/ICASSP.2018.8462459

A SUPERVISED APPROACH TO GLOBAL SIGNAL-TO-NOISE RATIO ESTIMATION FOR
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript, 720 KB

Cite this

Poorjam, A. H., Little, M. A., Jensen, J. R., & Christensen, M. G. (2018). A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 296-300). (2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)). IEEE. https://doi.org/10.1109/ICASSP.2018.8462459

Poorjam, Amir Hossein ; Little, Max A ; Jensen, Jesper Rindom et al. / A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. pp. 296-300 (2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)).

@inproceedings{60124f567cb849c3b29c642c15ba9e75,

title = "A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices",

abstract = "The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.",

author = "Poorjam, {Amir Hossein} and Little, {Max A} and Jensen, {Jesper Rindom} and Christensen, {Mads Gr{\ae}sb{\o}ll}",

note = "{\textcopyright} 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. ; 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ; Conference date: 15-04-2018 Through 20-04-2018",

year = "2018",

month = sep,

day = "13",

doi = "10.1109/ICASSP.2018.8462459",

language = "English",

isbn = "978-1-5386-4659-5",

series = "2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",

publisher = "IEEE",

pages = "296--300",

booktitle = "2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",

address = "United States",

}

Poorjam, AH, Little, MA, Jensen, JR & Christensen, MG 2018, A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 296-300, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 15/04/18. https://doi.org/10.1109/ICASSP.2018.8462459

A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. / Poorjam, Amir Hossein; Little, Max A; Jensen, Jesper Rindom et al.
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. p. 296-300 (2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)).

Research output: Chapter in Book/Published conference output › Conference publication

TY - GEN

T1 - A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices

AU - Poorjam, Amir Hossein

AU - Little, Max A

AU - Jensen, Jesper Rindom

AU - Christensen, Mads Græsbøll

N1 - © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

PY - 2018/9/13

Y1 - 2018/9/13

N2 - The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.

AB - The presence of background noise in signals adversely affects the performance of many speech-based algorithms. Accurate estimation of signal-to-noise-ratio (SNR), as a measure of noise level in a signal, can help in compensating for noise effects. Most existing SNR estimation methods have been developed for normal speech and might not provide accurate estimation for special speech types such as whispered or disordered voices, particularly, when they are corrupted by non-stationary noises. In this paper, we first investigate the impact of stationary and non-stationary noise on the behavior of mel-frequency cepstral coefficients (MFCCs) extracted from normal, whispered and pathological voices. We demonstrate that, regardless of the speech type, the mean and the covariance of MFCCs are predictably modified by additive noise and the amount of change is related to the noise level. Then, we propose a new supervised method for SNR estimation which is based on a regression model trained on MFCCs of the noisy signals. Experimental results show that the proposed approach provides accurate estimation and consistent performance for various speech types under different noise conditions.

UR - https://ieeexplore.ieee.org/document/8462459/?tp=&arnumber=8462459&contentType=Conferences&dld=YXN0b24uYWMudWs%3D&source=SEARCHALERT

U2 - 10.1109/ICASSP.2018.8462459

DO - 10.1109/ICASSP.2018.8462459

M3 - Conference publication

SN - 978-1-5386-4659-5

T3 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

SP - 296

EP - 300

BT - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

PB - IEEE

T2 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Y2 - 15 April 2018 through 20 April 2018

ER -

Poorjam AH, Little MA, Jensen JR, Christensen MG. A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2018. p. 296-300. (2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)). doi: 10.1109/ICASSP.2018.8462459

A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices

Abstract

Publication series

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this