Phoneme aware speech recognition through evolutionary optimisation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Phoneme awareness provides the path to high resolution speech recognition to overcome the difficulties of classical word recognition. Here we present the results of a preliminary study on Artificial Neural Network (ANN) and Hidden Markov Model (HMM) methods of classification for Human Speech Recognition through Diphthong Vowel sounds in the English Phonetic Alphabet, with a specific focus on evolutionary optimisation of bio-inspired classification methods. A set of audio clips are recorded by subjects from the United Kingdom and Mexico. For each recording, the data were pre-processed, using Mel-Frequency Cepstral Coefficients (MFCC) at a sliding window of 200ms per data object, as well as a further MFCC timeseries format for forecast-based models, to produce the dataset. We found that an evolutionary optimised deep neural network achieves 90.77% phoneme classification accuracy as opposed to the best HMM of 150 hidden units achieving 86.23% accuracy. Many of the evolutionary solutions take substantially longer to train than the HMM, however one solution scoring 87.5% (+1.27%) requires fewer resources than the HMM.

Original languageEnglish
Title of host publicationGECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion
PublisherACM
Pages362-363
Number of pages2
ISBN (Electronic)9781450367486
DOIs
Publication statusPublished - 13 Jul 2019
Event2019 Genetic and Evolutionary Computation Conference, GECCO 2019 - Prague, Czech Republic
Duration: 13 Jul 201917 Jul 2019

Publication series

NameGECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion

Conference

Conference2019 Genetic and Evolutionary Computation Conference, GECCO 2019
CountryCzech Republic
CityPrague
Period13/07/1917/07/19

Fingerprint

Evolutionary Optimization
Hidden Markov models
Speech Recognition
Speech recognition
Markov Model
Speech analysis
Sliding Window
Coefficient
Scoring
Artificial Neural Network
Forecast
High Resolution
Acoustic waves
Neural Networks
Neural networks
Path
Resources
Unit
Model

Keywords

  • Artificial Neural Networks
  • Computational Linguistics
  • Evolutionary Optimisation
  • Phoneme Awareness
  • Speech Recognition

Cite this

Bird, J. J., Wanner, E., Ekárt, A., & Faria, D. R. (2019). Phoneme aware speech recognition through evolutionary optimisation. In GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion (pp. 362-363). (GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion). ACM. https://doi.org/10.1145/3319619.3321951
Bird, Jordan J. ; Wanner, Elizabeth ; Ekárt, Anikó ; Faria, Diego R. / Phoneme aware speech recognition through evolutionary optimisation. GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion. ACM, 2019. pp. 362-363 (GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion).
@inproceedings{208c09b5adc84a35a40a88e99a5db8f6,
title = "Phoneme aware speech recognition through evolutionary optimisation",
abstract = "Phoneme awareness provides the path to high resolution speech recognition to overcome the difficulties of classical word recognition. Here we present the results of a preliminary study on Artificial Neural Network (ANN) and Hidden Markov Model (HMM) methods of classification for Human Speech Recognition through Diphthong Vowel sounds in the English Phonetic Alphabet, with a specific focus on evolutionary optimisation of bio-inspired classification methods. A set of audio clips are recorded by subjects from the United Kingdom and Mexico. For each recording, the data were pre-processed, using Mel-Frequency Cepstral Coefficients (MFCC) at a sliding window of 200ms per data object, as well as a further MFCC timeseries format for forecast-based models, to produce the dataset. We found that an evolutionary optimised deep neural network achieves 90.77{\%} phoneme classification accuracy as opposed to the best HMM of 150 hidden units achieving 86.23{\%} accuracy. Many of the evolutionary solutions take substantially longer to train than the HMM, however one solution scoring 87.5{\%} (+1.27{\%}) requires fewer resources than the HMM.",
keywords = "Artificial Neural Networks, Computational Linguistics, Evolutionary Optimisation, Phoneme Awareness, Speech Recognition",
author = "Bird, {Jordan J.} and Elizabeth Wanner and Anik{\'o} Ek{\'a}rt and Faria, {Diego R.}",
year = "2019",
month = "7",
day = "13",
doi = "10.1145/3319619.3321951",
language = "English",
series = "GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion",
publisher = "ACM",
pages = "362--363",
booktitle = "GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion",
address = "United States",

}

Bird, JJ, Wanner, E, Ekárt, A & Faria, DR 2019, Phoneme aware speech recognition through evolutionary optimisation. in GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion. GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion, ACM, pp. 362-363, 2019 Genetic and Evolutionary Computation Conference, GECCO 2019, Prague, Czech Republic, 13/07/19. https://doi.org/10.1145/3319619.3321951

Phoneme aware speech recognition through evolutionary optimisation. / Bird, Jordan J.; Wanner, Elizabeth; Ekárt, Anikó; Faria, Diego R.

GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion. ACM, 2019. p. 362-363 (GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Phoneme aware speech recognition through evolutionary optimisation

AU - Bird, Jordan J.

AU - Wanner, Elizabeth

AU - Ekárt, Anikó

AU - Faria, Diego R.

PY - 2019/7/13

Y1 - 2019/7/13

N2 - Phoneme awareness provides the path to high resolution speech recognition to overcome the difficulties of classical word recognition. Here we present the results of a preliminary study on Artificial Neural Network (ANN) and Hidden Markov Model (HMM) methods of classification for Human Speech Recognition through Diphthong Vowel sounds in the English Phonetic Alphabet, with a specific focus on evolutionary optimisation of bio-inspired classification methods. A set of audio clips are recorded by subjects from the United Kingdom and Mexico. For each recording, the data were pre-processed, using Mel-Frequency Cepstral Coefficients (MFCC) at a sliding window of 200ms per data object, as well as a further MFCC timeseries format for forecast-based models, to produce the dataset. We found that an evolutionary optimised deep neural network achieves 90.77% phoneme classification accuracy as opposed to the best HMM of 150 hidden units achieving 86.23% accuracy. Many of the evolutionary solutions take substantially longer to train than the HMM, however one solution scoring 87.5% (+1.27%) requires fewer resources than the HMM.

AB - Phoneme awareness provides the path to high resolution speech recognition to overcome the difficulties of classical word recognition. Here we present the results of a preliminary study on Artificial Neural Network (ANN) and Hidden Markov Model (HMM) methods of classification for Human Speech Recognition through Diphthong Vowel sounds in the English Phonetic Alphabet, with a specific focus on evolutionary optimisation of bio-inspired classification methods. A set of audio clips are recorded by subjects from the United Kingdom and Mexico. For each recording, the data were pre-processed, using Mel-Frequency Cepstral Coefficients (MFCC) at a sliding window of 200ms per data object, as well as a further MFCC timeseries format for forecast-based models, to produce the dataset. We found that an evolutionary optimised deep neural network achieves 90.77% phoneme classification accuracy as opposed to the best HMM of 150 hidden units achieving 86.23% accuracy. Many of the evolutionary solutions take substantially longer to train than the HMM, however one solution scoring 87.5% (+1.27%) requires fewer resources than the HMM.

KW - Artificial Neural Networks

KW - Computational Linguistics

KW - Evolutionary Optimisation

KW - Phoneme Awareness

KW - Speech Recognition

UR - http://www.scopus.com/inward/record.url?scp=85069182849&partnerID=8YFLogxK

UR - https://dl.acm.org/citation.cfm?doid=3319619.3321951

U2 - 10.1145/3319619.3321951

DO - 10.1145/3319619.3321951

M3 - Conference contribution

AN - SCOPUS:85069182849

T3 - GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion

SP - 362

EP - 363

BT - GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion

PB - ACM

ER -

Bird JJ, Wanner E, Ekárt A, Faria DR. Phoneme aware speech recognition through evolutionary optimisation. In GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion. ACM. 2019. p. 362-363. (GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion). https://doi.org/10.1145/3319619.3321951