A computational model of intonation for yorùbá text-to-speech synthesis: design and analysis

Odétúnjí A. Odéjobí, Anthony J. Beaumont, Shun Ha Sylvia Wong

Research output: Chapter in Book/Published conference outputConference publication

Abstract

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.

Original languageEnglish
Title of host publicationText, speech and dialogue
Subtitle of host publication7th international conference, TSD 2004, Brno, Czech Republic, September 8-11, 2004. Proceedings
EditorsPetr Sojka, Ivan Kopeček, Karel Pala
Place of PublicationBerlin (DE)
PublisherSpringer
Pages409-416
Number of pages8
VolumePart III
ISBN (Electronic)978-3-540-30120-2
ISBN (Print)978-3-540-23049-6
DOIs
Publication statusPublished - 2004

Publication series

NameLecture notes in computer science
PublisherSpringer
Volume3206
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Dive into the research topics of 'A computational model of intonation for yorùbá text-to-speech synthesis: design and analysis'. Together they form a unique fingerprint.

Cite this