Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Phonetic Transcription Verification with Generalized Posterior Probability

Lijuan Wang (1), Yong Zhao (2), Min Chu (2), Frank K. Soong (2), Zhigang Cao (1)

(1) Tsinghua University, Beijing, China; (2) Microsoft Research Asia, Beijing, China

Accurate phonetic transcription is critical to high quality concatenation based text-to-speech synthesis. In this paper, we propose to use generalized syllable posterior probability (GSPP) as a statistical confidence measure to verify errors in phonetic transcriptions, such as reading errors, inadequate alternatives of pronunciations in the lexicon, letter-to-sound errors in transcribing out-of-vocabulary words, idiosyncratic pronunciations, etc. in a TTS speech database. GSPP is computed based upon a syllable graph generated by a recognition decoder. Testing on two data sets, the proposed GSPP is shown to be effective in locating phonetic transcription errors. Equal error rates (EERs) of 8.2% and 8.4%, are obtained on two testing sets, respectively. It is also found that the GSPP verification performance is fairly stable over a wide range around the optimal value of acoustic model exponential weight used in computing GSPP.

Full Paper

Bibliographic reference.  Wang, Lijuan / Zhao, Yong / Chu, Min / Soong, Frank K. / Cao, Zhigang (2005): "Phonetic transcription verification with generalized posterior probability", In INTERSPEECH-2005, 1949-1952.