We propose a new speech database reduction method that can create efficient speech databases for concatenation-type corpus-based TTS systems. Our aim is to create small speech databases that can yield the highest quality speech output possible. The main points of proposed method are as follows; (1) It has a 2-stage algorithm to reduce speech database size. (2) Consideration of the real speech elements needed allows us to select the most suitable subset of a full-size database; this yields scalable downsized speech databases. A listening test shows that proposed method can reduced a database from 13 hours to 10 hours with no degradation in output quality. Furthermore, synthesized speech using database sizes of 8 and 6 hours keeps relatively high MOS of more than 3.5; 95% of MOS using full size database.
Bibliographic reference. Isogai, Mitsuaki / Mizuno, Hideyuki (2010): "Speech database reduction method for corpus-based TTS system", In INTERSPEECH-2010, 158-161.