Using publicly available audiobooks for HMM-TTS poses new challenges. This paper addresses the issue of diverse speech in audiobooks. The aim is to identify diverse speech likely to have a negative effect on HMM-TTS quality. Manual removal of diverse speech was found to yield better synthesis quality despite halving the training corpus. To handle large amounts of data an automatic approach is proposed. The approach uses a small set of acoustic and text based features. A series of listening tests showed that the manual selection is most preferred, while the automatic selection showed significant preference over the full training set.
Bibliographic reference. Braunschweiler, Norbert / Buchholz, Sabine (2011): "Automatic sentence selection from speech corpora including diverse speech for improved HMM-TTS synthesis quality", In INTERSPEECH-2011, 1821-1824.