Intonation contour realisation for Standard Yorùbá text-to-speech synthesis: a fuzzy computational approach

Ọdẹ´túnjí A. Odé´jọbí, Anthony J. Beaumont, Shun Ha Sylvia Wong

Research output: Contribution to journalArticle

Abstract

This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.
Original languageEnglish
Pages (from-to)563-588
Number of pages26
JournalComputer Speech and Language
Volume20
Issue number4
DOIs
Publication statusPublished - Oct 2006

Fingerprint

Text-to-speech
Speech Synthesis
Speech synthesis
Waveform
Speech intelligibility
Prosody
Model
Fuzzy logic
Data structures
Interpolation
Data-driven
Fuzzy Logic
Standards
Overlap
Data Structures
Interpolate
Predict
Target
Software
Evaluation

Cite this

@article{c8a60ab0eb524ccaace0aed32d4d321f,
title = "Intonation contour realisation for Standard Yor{\`u}b{\'a} text-to-speech synthesis: a fuzzy computational approach",
abstract = "This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yor{\`u}b{\'a} language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.",
author = "Od{\'e}´jọb{\'i}, {Ọdẹ´t{\'u}nj{\'i} A.} and Beaumont, {Anthony J.} and Wong, {Shun Ha Sylvia}",
note = "Copyright 2008 Elsevier B.V., All rights reserved.",
year = "2006",
month = "10",
doi = "10.1016/j.csl.2005.08.006",
language = "English",
volume = "20",
pages = "563--588",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press Inc.",
number = "4",

}

Intonation contour realisation for Standard Yorùbá text-to-speech synthesis : a fuzzy computational approach. / Odé´jọbí, Ọdẹ´túnjí A.; Beaumont, Anthony J.; Wong, Shun Ha Sylvia.

In: Computer Speech and Language, Vol. 20, No. 4, 10.2006, p. 563-588.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Intonation contour realisation for Standard Yorùbá text-to-speech synthesis

T2 - a fuzzy computational approach

AU - Odé´jọbí, Ọdẹ´túnjí A.

AU - Beaumont, Anthony J.

AU - Wong, Shun Ha Sylvia

N1 - Copyright 2008 Elsevier B.V., All rights reserved.

PY - 2006/10

Y1 - 2006/10

N2 - This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.

AB - This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.

UR - http://www.scopus.com/inward/record.url?scp=33746622090&partnerID=8YFLogxK

UR - https://www.sciencedirect.com/science/article/pii/S0885230805000525?via%3Dihub

U2 - 10.1016/j.csl.2005.08.006

DO - 10.1016/j.csl.2005.08.006

M3 - Article

AN - SCOPUS:33746622090

VL - 20

SP - 563

EP - 588

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

IS - 4

ER -