Speech intelligibility is one of the most important measures in evaluating text-to-speech (TTS) synthesizer. For fast comparing, developing, and deploying TTS systems, automatic objective intelligibility measurement is desired, as human listening test is label intensive, inconsistent, and with expensive cost. In this work, we propose an automatic objective intelligibility measure for synthesized speech using template constrained generalized posterior probability (TCGPP). TCGPP is a posterior probability based confidence measure, which has the advantage to identify errors in synthesized speech at small granularity level. Moreover, the TCGPP scores over a testing set can be summarized into an overall objective intelligibility metric to compare two synthesizers, or rank multiple TTS systems. We conducted the experiments using the synthesized test sentences from all the participants of EH1 English task in Blizzard Challenge 2010. The results show the proposed measure has high correlation (corr=0.9) with subjective scores and ranking.
Index Terms: speech synthesis, objective intelligibility, Template constrained generalized posterior probability
Bibliographic reference. Wang, Linfang / Wang, Lijuan / Teng, Yan / Geng, Zhe / Soong, Frank K. (2012): "Objective intelligibility assessment of text-to-speech system using template constrained generalized posterior probability", In INTERSPEECH-2012, 627-630.