Interspeech'2005 - Eurospeech
Accurate phonetic transcription is critical to high quality concatenation based text-to-speech synthesis. In this paper, we propose to use generalized syllable posterior probability (GSPP) as a statistical confidence measure to verify errors in phonetic transcriptions, such as reading errors, inadequate alternatives of pronunciations in the lexicon, letter-to-sound errors in transcribing out-of-vocabulary words, idiosyncratic pronunciations, etc. in a TTS speech database. GSPP is computed based upon a syllable graph generated by a recognition decoder. Testing on two data sets, the proposed GSPP is shown to be effective in locating phonetic transcription errors. Equal error rates (EERs) of 8.2% and 8.4%, are obtained on two testing sets, respectively. It is also found that the GSPP verification performance is fairly stable over a wide range around the optimal value of acoustic model exponential weight used in computing GSPP.
Bibliographic reference. Wang, Lijuan / Zhao, Yong / Chu, Min / Soong, Frank K. / Cao, Zhigang (2005): "Phonetic transcription verification with generalized posterior probability", In INTERSPEECH-2005, 1949-1952.