Skip to main navigation Skip to search Skip to main content

Language Accent Detection with CNN Using Sparse Data from a Crowd-Sourced Speech Archive

  • Veranika Mikhailava
  • , Mariia Lesnichaia
  • , Natalia Bogach
  • , Iurii Lezhenin
  • , John Blake
  • , Evgeny Pyshkin

Research output: Contribution to journalArticlepeer-review

17   Link opens in a new tab Citations (SciVal)

Abstract

The problem of accent recognition has received a lot of attention with the development of Automatic Speech Recognition (ASR) systems. The crux of the problem is that conventional acoustic language models adapted to fit standard language corpora are unable to satisfy the recognition requirements for accented speech. In this research, we contribute to the accent recognition task for a group of up to nine European accents in English and try to provide some evidence in favor of specific hyperparameter choices for neural network models together with the search for the best input speech signal parameters to ameliorate the baseline accent recognition accuracy. Specifically, we used a CNN-based model trained on the audio features extracted from the Speech Accent Archive dataset, which is a crowd-sourced collection of accented speech recordings. We show that harnessing time–frequency and energy features (such as spectrogram, chromogram, spectral centroid, spectral rolloff, and fundamental frequency) to the Mel-frequency cepstral coefficients (MFCC) may increase the accuracy of the accent classification compared to the conventional feature sets of MFCC and/or raw spectrograms. Our experiments demonstrate that the most impact is brought about by amplitude mel-spectrograms on a linear scale fed into the model. Amplitude mel-spectrograms on a linear scale, which are the correlates of the audio signal energy, allow to produce state-of-the-art classification results and brings the recognition accuracy for English with Germanic, Romance and Slavic accents ranged from 0.964 to 0.987; thus, outperforming existing models of classifying accents which use the Speech Accent Archive. We also investigated how the speech rhythm affects the recognition accuracy. Based on our preliminary experiments, we used the audio recordings in their original form (i.e., with all the pauses preserved) for other accent classification experiments.
Original languageEnglish
Article number2913
Number of pages30
JournalMathematics
Volume10
Issue number16
DOIs
Publication statusPublished - 12 Aug 2022

Fingerprint

Dive into the research topics of 'Language Accent Detection with CNN Using Sparse Data from a Crowd-Sourced Speech Archive'. Together they form a unique fingerprint.

Cite this