INTERSPEECH 2004 - ICSLP
In this paper, methods to reconcile pronunciation differences between a rule-based front-end and the pronunciations observed in a database of recorded speech are presented. The methods are applied to the IBM Expressive Speech Synthesis System  for both unrestricted and limited-domain text-to-speech synthesis. One method is based on constructing a multiple pronunciation lattice for the given sentence and scoring it using word and phoneme n-gram statistics computed from the target speaker's database. A second method consists of storing observed pronunciations and introducing them as alternates in the search. We compare the strengths and weaknesses of these two methods. Results show that improvements are achieved in both limited and unrestricted domains, with the largest gains coming in the limited-domain case.
Bibliographic reference. Hamza, Wael / Eide, Ellen / Bakis, Raimo (2004): "Reconciling pronunciation differences between the front-end and the back-end in the IBM speech synthesis system", In INTERSPEECH-2004, 2561-2564.