Intonation contour realisation for Standard Yorùbá text-to-speech synthesis: a fuzzy computational approach

Ọdẹ´túnjí A. Odé´jọbí; Anthony J. Beaumont; Shun Ha Sylvia Wong

doi:10.1016/j.csl.2005.08.006

Intonation contour realisation for Standard Yorùbá text-to-speech synthesis: a fuzzy computational approach

Ọdẹ´túnjí A. Odé´jọbí, Anthony J. Beaumont, Shun Ha Sylvia Wong

Computer Science Research Group

Research output: Contribution to journal › Article › peer-review

Abstract

This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.

Original language	English
Pages (from-to)	563-588
Number of pages	26
Journal	Computer Speech and Language
Volume	20
Issue number	4
DOIs	https://doi.org/10.1016/j.csl.2005.08.006
Publication status	Published - Oct 2006

Access to Document

10.1016/j.csl.2005.08.006

Cite this

@article{c8a60ab0eb524ccaace0aed32d4d321f,

title = "Intonation contour realisation for Standard Yor{\`u}b{\'a} text-to-speech synthesis: a fuzzy computational approach",

abstract = "This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yor{\`u}b{\'a} language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.",

author = "Od{\'e}´jọb{\'i}, {Ọdẹ´t{\'u}nj{\'i} A.} and Beaumont, {Anthony J.} and Wong, {Shun Ha Sylvia}",

year = "2006",

month = oct,

doi = "10.1016/j.csl.2005.08.006",

language = "English",

volume = "20",

pages = "563--588",

journal = "Computer Speech and Language",

issn = "0885-2308",

publisher = "Academic Press Inc.",

number = "4",

}

TY - JOUR

T1 - Intonation contour realisation for Standard Yorùbá text-to-speech synthesis

T2 - a fuzzy computational approach

AU - Odé´jọbí, Ọdẹ´túnjí A.

AU - Beaumont, Anthony J.

AU - Wong, Shun Ha Sylvia

PY - 2006/10

Y1 - 2006/10

N2 - This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.

AB - This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.

UR - http://www.scopus.com/inward/record.url?scp=33746622090&partnerID=8YFLogxK

UR - https://www.sciencedirect.com/science/article/pii/S0885230805000525?via%3Dihub

U2 - 10.1016/j.csl.2005.08.006

DO - 10.1016/j.csl.2005.08.006

M3 - Article

AN - SCOPUS:33746622090

SN - 0885-2308

VL - 20

SP - 563

EP - 588

JO - Computer Speech and Language

JF - Computer Speech and Language

IS - 4

ER -

Intonation contour realisation for Standard Yorùbá text-to-speech synthesis: a fuzzy computational approach

Abstract

Access to Document

Other files and links

Fingerprint

Cite this