8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Improving the Accuracy of Pronunciation Prediction for Unit Selection TTS

Justin Fackrell, Wojciech Skut, Kathrine Hammervold

Rhetorical Systems Ltd., U.K.

This paper describes a technique which improves the accuracy of pronunciation prediction for unit selection TTS. It does this by performing an orthography-based context-dependent lookup on the unit database. During synthesis, the pronunciations of words which have matching contexts in the unit database are determined. Pronunciations not found using this method are determined using traditional lexicon lookup and/or letter-to-sound rules. In its simplest form, the model involves a lookup based on left and right word context. A modified form, which backs-off to a lookup based on right context, is shown to have a much higher firing rate, and to produce more pronunciation variation. The technique is good at occasionally inhibiting vowel reduction; at choosing appropriate pronunciations in case of free variation; and at choosing the correct pronunciation for names. Its effectiveness is assessed by experiments on unseen data; by resynthesis; and by a listening test on sentences rich in reducible words.

Full Paper

Bibliographic reference.  Fackrell, Justin / Skut, Wojciech / Hammervold, Kathrine (2003): "Improving the accuracy of pronunciation prediction for unit selection TTS", In EUROSPEECH-2003, 2473-2476.