INTERSPEECH 2004 - ICSLP
8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Reconciling Pronunciation Differences between the Front-End and the Back-End in the IBM Speech Synthesis System

Wael Hamza, Ellen Eide, Raimo Bakis

IBM T.J. Watson Research Center, USA

In this paper, methods to reconcile pronunciation differences between a rule-based front-end and the pronunciations observed in a database of recorded speech are presented. The methods are applied to the IBM Expressive Speech Synthesis System [1] for both unrestricted and limited-domain text-to-speech synthesis. One method is based on constructing a multiple pronunciation lattice for the given sentence and scoring it using word and phoneme n-gram statistics computed from the target speaker's database. A second method consists of storing observed pronunciations and introducing them as alternates in the search. We compare the strengths and weaknesses of these two methods. Results show that improvements are achieved in both limited and unrestricted domains, with the largest gains coming in the limited-domain case.

Full Paper

Bibliographic reference.  Hamza, Wael / Eide, Ellen / Bakis, Raimo (2004): "Reconciling pronunciation differences between the front-end and the back-end in the IBM speech synthesis system", In INTERSPEECH-2004, 2561-2564.