Phoneme Aware Speech Synthesis via Fine Tune Transfer Learning with a Tacotron Spectrogram Prediction Network

Jordan J. Bird, Anikó Ekárt, Diego R. Faria

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The implications of realistic human speech imitation are both promising but potentially dangerous. In this work, a pre-trained Tacotron Spectrogram Feature Prediction Network is fine tuned with two 1.6 h speech datasets for 100,000 learning iterations, producing two individual models. The two Speech datasets are completely identical in content other than their textual representation, one follows the standard English language, whereas the second is an English phonetic representation in order to study the effects on the learning processes. To test imitative abilities post-training, thirty lines of speech are recorded from a human to be imitated. The models then attempt to produce these voice lines themselves, and the acoustic fingerprint of the outputs are compared to the real human speech. On average, English notation achieves 27.36%, whereas Phonetic English notation achieves 35.31% similarity to a human being. This suggests that representation of English through the International Phonetic Alphabet serves as more useful data than written English language. Thus, it is suggested from these experiments that a phonetic-aware paradigm would improve the abilities of speech synthesis similarly to its effects in the field of speech recognition.
Original languageEnglish
Title of host publicationAdvances in Computational Intelligence Systems - Contributions Presented at the 19th UK Workshop on Computational Intelligence, 2019
EditorsZhaojie Ju, Dalin Zhou, Alexander Gegov, Longzhi Yang, Chenguang Yang
PublisherSpringer
Chapter23
Pages271-282
Number of pages12
Volume1043
ISBN (Electronic)978-3-030-29933-0
ISBN (Print)978-3-030-29932-3
DOIs
Publication statusPublished - 30 Aug 2019
Event19th UK Workshop on Computational Intelligence : UKCI 2019 - Portsmouth, United Kingdom
Duration: 4 Sep 20196 Sep 2019

Publication series

NameAdvances in Intelligent Systems and Computing
Volume1043
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365

Conference

Conference19th UK Workshop on Computational Intelligence
CountryUnited Kingdom
CityPortsmouth
Period4/09/196/09/19

Keywords

  • Fine tune learning
  • Fingerprint analysis
  • Phonetic awareness
  • Speech synthesis
  • Tacotron

Fingerprint Dive into the research topics of 'Phoneme Aware Speech Synthesis via Fine Tune Transfer Learning with a Tacotron Spectrogram Prediction Network'. Together they form a unique fingerprint.

  • Cite this

    Bird, J. J., Ekárt, A., & Faria, D. R. (2019). Phoneme Aware Speech Synthesis via Fine Tune Transfer Learning with a Tacotron Spectrogram Prediction Network. In Z. Ju, D. Zhou, A. Gegov, L. Yang, & C. Yang (Eds.), Advances in Computational Intelligence Systems - Contributions Presented at the 19th UK Workshop on Computational Intelligence, 2019 (Vol. 1043, pp. 271-282). (Advances in Intelligent Systems and Computing; Vol. 1043). Springer. https://doi.org/10.1007/978-3-030-29933-0_23