EUROSPEECH 2003 - INTERSPEECH 2003
This paper describes a technique which improves the accuracy of pronunciation prediction for unit selection TTS. It does this by performing an orthography-based context-dependent lookup on the unit database. During synthesis, the pronunciations of words which have matching contexts in the unit database are determined. Pronunciations not found using this method are determined using traditional lexicon lookup and/or letter-to-sound rules. In its simplest form, the model involves a lookup based on left and right word context. A modified form, which backs-off to a lookup based on right context, is shown to have a much higher firing rate, and to produce more pronunciation variation. The technique is good at occasionally inhibiting vowel reduction; at choosing appropriate pronunciations in case of free variation; and at choosing the correct pronunciation for names. Its effectiveness is assessed by experiments on unseen data; by resynthesis; and by a listening test on sentences rich in reducible words.
Bibliographic reference. Fackrell, Justin / Skut, Wojciech / Hammervold, Kathrine (2003): "Improving the accuracy of pronunciation prediction for unit selection TTS", In EUROSPEECH-2003, 2473-2476.