Code-mixed street address recognition and accent adaptation for voice-activated navigation services

Syed Meesam Raza Naqvi, Muhammad Ali Tahir, Kamran Javed, Hassan Aqeel Khan, Ali Raza, Zubair Saeed

Research output: Contribution to journalArticlepeer-review

Abstract

This study presents the development of a real-time application-specific Automatic Speech Recognition (ASR) system for voice-activated navigation services. The system is designed to recognize Urdu-English code-mixed street addresses, which is challenging due to their complex nature and structure, especially in under-resourced languages such as Urdu. Two separate corpora are collected for ASR system development: Unicode Urdu consisting of general Urdu recordings of around 61.82 hours by 144 speakers and Roman Urdu-English code-mixed Addresses of around 16.89 hours by 20 speakers. The Unicode Urdu data is developed to provide acoustic models with general language understanding and code-mixed street addresses to provide code-mixing or switching coverage. The hybrid ASR system employed in this study plays a crucial role in addressing the multifaceted challenges of low-resource settings (only 16.89 hours of task-specific data), especially in the context of Urdu-English code-switching. The study compares various acoustic models, with mixed Time Delay Neural Network and Long Short-Term Memory (TDNN-LSTM) performing best with a Word Error Rate (WER), Character Error Rate (CER), and Sentence Error Rate (SER) of 4.02%, 0.8%, and 15.14% respectively, on random street addresses. In addition to testing street addresses, we performed accent-based and manual decoding testing on the developed ASR system. Results indicate the need to develop and deploy custom ASR systems for better accent adaptation and application-specific coverage. The developed ASR system is integrated into the TPL Maps (https://tplmaps.com/) mobile application. It is Pakistan’s first Large Vocabulary Continuous Speech Recognition (LVCSR) real-time system to provide Urdu-based voice-activated navigation services.

Original languageEnglish
Article number168393
Pages (from-to)168393-168411
Number of pages19
JournalIEEE Access
Volume12
Early online date12 Nov 2024
DOIs
Publication statusPublished - 22 Nov 2024

Bibliographical note

Copyright © 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

Keywords

  • Speech recognition
  • Hidden Markov models
  • Acoustics
  • Vocabulary
  • Speech coding
  • Real-time systems
  • Navigation
  • Long short term memory
  • Error analysis
  • Switches
  • Urdu-English code-mixing
  • roman Urdu addresses
  • hidden Markov models
  • accent adaptation
  • Gaussian mixture models
  • voice-activated navigation
  • deep neural network

Fingerprint

Dive into the research topics of 'Code-mixed street address recognition and accent adaptation for voice-activated navigation services'. Together they form a unique fingerprint.

Cite this