Accurate phonetic transcription is critical to high quality concatenation based text-to-speech synthesis. In this paper, we propose to use generalized syllable posterior probability (GSPP) as a statistical confidence measure to verify errors in phonetic transcriptions, such as reading errors, inadequate alternatives of pronunciations in the lexicon, letter-to-sound errors in transcribing out-of-vocabulary words, idiosyncratic pronunciations, etc. in a TTS speech database. GSPP is computed based upon a syllable graph generated by a recognition decoder. Testing on two data sets, the proposed GSPP is shown to be effective in locating phonetic transcription errors. Equal error rates (EERs) of 8.2% and 8.4%, are obtained on two testing sets, respectively. It is also found that the GSPP verification performance is fairly stable over a wide range around the optimal value of acoustic model exponential weight used in computing GSPP.
Cite as: Wang, L., Zhao, Y., Chu, M., Soong, F.K., Cao, Z. (2005) Phonetic transcription verification with generalized posterior probability. Proc. Interspeech 2005, 1949-1952, doi: 10.21437/Interspeech.2005-609
@inproceedings{wang05j_interspeech, author={Lijuan Wang and Yong Zhao and Min Chu and Frank K. Soong and Zhigang Cao}, title={{Phonetic transcription verification with generalized posterior probability}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1949--1952}, doi={10.21437/Interspeech.2005-609} }