12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Automatic Sentence Selection from Speech Corpora Including Diverse Speech for Improved HMM-TTS Synthesis Quality

Norbert Braunschweiler, Sabine Buchholz

Toshiba Research Europe Ltd., UK

Using publicly available audiobooks for HMM-TTS poses new challenges. This paper addresses the issue of diverse speech in audiobooks. The aim is to identify diverse speech likely to have a negative effect on HMM-TTS quality. Manual removal of diverse speech was found to yield better synthesis quality despite halving the training corpus. To handle large amounts of data an automatic approach is proposed. The approach uses a small set of acoustic and text based features. A series of listening tests showed that the manual selection is most preferred, while the automatic selection showed significant preference over the full training set.

Full Paper

Bibliographic reference.  Braunschweiler, Norbert / Buchholz, Sabine (2011): "Automatic sentence selection from speech corpora including diverse speech for improved HMM-TTS synthesis quality", In INTERSPEECH-2011, 1821-1824.