Second International Conference on Spoken Language Processing (ICSLP'92)

Banff, Alberta, Canada
October 13-16, 1992

A Nucleus-Based Timing Model Applied to Multi-Dialect Speech Synthesis By Rule

Susan R. Hertz (1), Marie K. Huffman (2)

(1) Eloquent Technology, Inc., Ithaca, N.Y., USA; (2) Dept. of Modern Languages and Linguistics, Cornell University, USA

This paper presents a new timing model for rule-based speech synthesis which underlies the rules we are developing for five American English dialects, and which we are beginning to extend to other languages. The model leads to extremely efficient development of high-quality rules for different dialects and languages, and, more generally, provides new insights into the nature of speech. The paper presents the basic tenets of the model, showing how it leads to generalizations about speech patterns within and across dialects (and languages) that cannot be captured in more conventional models. The introduction discusses the nature of utterance representations structured in accordance with our model, using an example from General American English. The second section illustrates our application of the model to rule-based synthesis of American English dialects, focussing in particular on the straightforward and accurate rules for formant timing made possible by the model. The third section discusses the directions our work is currently taking, including development of a novel, multi-dialect relational database for extracting cross-dialect and intra-dialect generalizations, and experiments aimed at determining when observed variability in timing patterns within and across dialects is perceptually important.

