TY - GEN
T1 - A computational model of intonation for yorùbá text-to-speech synthesis
T2 - design and analysis
AU - Odéjobí, Odétúnjí A.
AU - Beaumont, Anthony J.
AU - Wong, Shun Ha Sylvia
N1 - Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2004
Y1 - 2004
N2 - In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.
AB - In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.
UR - http://www.scopus.com/inward/record.url?scp=22944489587&partnerID=8YFLogxK
UR - http://link.springer.com/chapter/10.1007%2F978-3-540-30120-2_52
U2 - 10.1007/978-3-540-30120-2_52
DO - 10.1007/978-3-540-30120-2_52
M3 - Conference publication
AN - SCOPUS:22944489587
SN - 978-3-540-23049-6
VL - Part III
T3 - Lecture notes in computer science
SP - 409
EP - 416
BT - Text, speech and dialogue
A2 - Sojka, Petr
A2 - Kopeček, Ivan
A2 - Pala, Karel
PB - Springer
CY - Berlin (DE)
ER -