Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Optimization of Text-to-Speech Phonetic Transcriptions using A-Posteriori Signal Comparison

S. Revelin (1), D. Cadic (2), C. Waast-Richard (1)

(1) IBM France, France; (2) France TÚlÚcom, France

One issue arising in text-to-phone conversion is inconsistency between its output and the phonetic time-alignment of the dataset, hindering the back-end's ability to access the best units to synthesize a text. Some such inconsistency is inevitable because dataset labeling requires allowance for alternate pronunciations of words, while the front-end typically predicts a single pronunciation for a word. In this paper we describe an alternate algorithm that recovers from these inconsistencies. The front-end is modified in order to allow multiple pronunciations for a word. The selection of the best pronunciation is based on an a posteriori cost function comparison between the synthetic signals.

Full Paper

Bibliographic reference.  Revelin, S. / Cadic, D. / Waast-Richard, C. (2005): "Optimization of text-to-speech phonetic transcriptions using a-posteriori signal comparison", In INTERSPEECH-2005, 1885-1888.