A computational model of prosody for Yorùbá text-to-speech synthesis

Odétúnjí A. Odéjobí

Student thesis: Doctoral Thesis › Doctor of Philosophy

Abstract

This work examines prosody modelling for the Standard Yorùbá (SY) language in the context of computer text-to-speech synthesis applications. The thesis of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combines acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. Our prosody model is conceptualised around a modular holistic framework. The framework is implemented using the Relational Tree (R-Tree) techniques (Ehrich and Foith, 1976). R-Tree is a sophisticated data structure that provides a multi-dimensional description of a waveform. A Skeletal Tree (S-Tree) is first generated using algorithms based on the tone phonological rules of SY. Subsequent steps update the S-Tree by computing the numerical values of the prosody dimensions.
To implement the intonation dimension, fuzzy control rules where developed based on data from native speakers of Yorùbá. The Classification And Regression Tree (CART) and the Fuzzy Decision Tree (FDT) techniques were tested in modelling the duration dimension. The FDT was selected based on its better performance.
An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration and intonation, using different techniques and their subsequent integration.
Our approach provides us with a flexible and extendible model that can also be used to implement, study and explain the theory behind aspects of the phenomena observed in speech prosody.

Date of Award	Jul 2005
Original language	English
Supervisor	Tony Beaumont (Supervisor) & Sylvia Wong (Supervisor)

Keywords

prosody modelling
speech synthesis
intonation modelling
duration modelling
fuzzy logic
fuzzy decision tree
relational tree

Cite this

Documents

A computational model of prosody for Yorùbá text-to-speech synthesis
File: application/pdf, 11.6 MB
Type: Thesis