ISCA Archive Interspeech 2005
Phonetic transcription verification with generalized posterior probability

Lijuan Wang, Yong Zhao, Min Chu, Frank K. Soong, Zhigang Cao

Accurate phonetic transcription is critical to high quality concatenation based text-to-speech synthesis. In this paper, we propose to use generalized syllable posterior probability (GSPP) as a statistical confidence measure to verify errors in phonetic transcriptions, such as reading errors, inadequate alternatives of pronunciations in the lexicon, letter-to-sound errors in transcribing out-of-vocabulary words, idiosyncratic pronunciations, etc. in a TTS speech database. GSPP is computed based upon a syllable graph generated by a recognition decoder. Testing on two data sets, the proposed GSPP is shown to be effective in locating phonetic transcription errors. Equal error rates (EERs) of 8.2% and 8.4%, are obtained on two testing sets, respectively. It is also found that the GSPP verification performance is fairly stable over a wide range around the optimal value of acoustic model exponential weight used in computing GSPP.

doi: 10.21437/Interspeech.2005-609

Cite as: Wang, L., Zhao, Y., Chu, M., Soong, F.K., Cao, Z. (2005) Phonetic transcription verification with generalized posterior probability. Proc. Interspeech 2005, 1949-1952, doi: 10.21437/Interspeech.2005-609

