Corpus-driven speech synthesis is hampered by the occurrence of occasional glitches which ruin the impression of the whole utterance. We propose an iterative unit selection integrated with an unnatural prosody detection model to identify any unnatural prosody. The system searches an optimal path in the lattice, verifies its naturalness by the unnatural prosody model and replaces the bad section with a better candidate, until it passes the verification test. In light of hypothesis testing, we show this trial-and-error approach takes effective advantage of abundant candidate samples in the database. Also, in contrast to conventional prosody prediction, an unnatural prosody detection model still leaves enough room for the prosody variations. Unnaturalness confidence measures are studied. The combined model can reduce the objective distortion by 16.3%. Perceptual experiments also confirm the proposed approach improves the synthetic speech quality appreciably.
Bibliographic reference. Lin, Dacheng / Zhao, Yong / Soong, Frank K. / Chu, Min / Zhao, Jieyu (2007): "Iterative unit selection with unnatural prosody detection", In INTERSPEECH-2007, 2909-2912.