TY - GEN
T1 - Phoneme aware speech recognition through evolutionary optimisation
AU - Bird, Jordan J.
AU - Wanner, Elizabeth
AU - Ekárt, Anikó
AU - Faria, Diego R.
PY - 2019/7/13
Y1 - 2019/7/13
N2 - Phoneme awareness provides the path to high resolution speech recognition to overcome the difficulties of classical word recognition. Here we present the results of a preliminary study on Artificial Neural Network (ANN) and Hidden Markov Model (HMM) methods of classification for Human Speech Recognition through Diphthong Vowel sounds in the English Phonetic Alphabet, with a specific focus on evolutionary optimisation of bio-inspired classification methods. A set of audio clips are recorded by subjects from the United Kingdom and Mexico. For each recording, the data were pre-processed, using Mel-Frequency Cepstral Coefficients (MFCC) at a sliding window of 200ms per data object, as well as a further MFCC timeseries format for forecast-based models, to produce the dataset. We found that an evolutionary optimised deep neural network achieves 90.77% phoneme classification accuracy as opposed to the best HMM of 150 hidden units achieving 86.23% accuracy. Many of the evolutionary solutions take substantially longer to train than the HMM, however one solution scoring 87.5% (+1.27%) requires fewer resources than the HMM.
AB - Phoneme awareness provides the path to high resolution speech recognition to overcome the difficulties of classical word recognition. Here we present the results of a preliminary study on Artificial Neural Network (ANN) and Hidden Markov Model (HMM) methods of classification for Human Speech Recognition through Diphthong Vowel sounds in the English Phonetic Alphabet, with a specific focus on evolutionary optimisation of bio-inspired classification methods. A set of audio clips are recorded by subjects from the United Kingdom and Mexico. For each recording, the data were pre-processed, using Mel-Frequency Cepstral Coefficients (MFCC) at a sliding window of 200ms per data object, as well as a further MFCC timeseries format for forecast-based models, to produce the dataset. We found that an evolutionary optimised deep neural network achieves 90.77% phoneme classification accuracy as opposed to the best HMM of 150 hidden units achieving 86.23% accuracy. Many of the evolutionary solutions take substantially longer to train than the HMM, however one solution scoring 87.5% (+1.27%) requires fewer resources than the HMM.
KW - Artificial Neural Networks
KW - Computational Linguistics
KW - Evolutionary Optimisation
KW - Phoneme Awareness
KW - Speech Recognition
UR - http://www.scopus.com/inward/record.url?scp=85069182849&partnerID=8YFLogxK
UR - https://dl.acm.org/citation.cfm?doid=3319619.3321951
U2 - 10.1145/3319619.3321951
DO - 10.1145/3319619.3321951
M3 - Conference publication
T3 - GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion
SP - 362
EP - 363
BT - GECCO 2019 Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion
PB - ACM
T2 - 2019 Genetic and Evolutionary Computation Conference, GECCO 2019
Y2 - 13 July 2019 through 17 July 2019
ER -