ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Thousands of voices for HMM-based speech synthesis

Junichi Yamagishi, Bela Usabaev, Simon King, Oliver Watts, John Dines, Jilei Tian, Rile Hu, Yong Guan, Keiichiro Oura, Keiichi Tokuda, Reima Karhila, Mikko Kurimo

Our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an ‘average voice model’ plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack of phonetic balance. This enables us consider building high-quality voices on ‘non-TTS’ corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper we show thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal databases (WSJ0/WSJ1/WSJCAM0), Resource Management, Globalphone and Speecon. We report some perceptual evaluation results and outline the outstanding issues.

doi: 10.21437/Interspeech.2009-140

Cite as: Yamagishi, J., Usabaev, B., King, S., Watts, O., Dines, J., Tian, J., Hu, R., Guan, Y., Oura, K., Tokuda, K., Karhila, R., Kurimo, M. (2009) Thousands of voices for HMM-based speech synthesis. Proc. Interspeech 2009, 420-423, doi: 10.21437/Interspeech.2009-140

  author={Junichi Yamagishi and Bela Usabaev and Simon King and Oliver Watts and John Dines and Jilei Tian and Rile Hu and Yong Guan and Keiichiro Oura and Keiichi Tokuda and Reima Karhila and Mikko Kurimo},
  title={{Thousands of voices for HMM-based speech synthesis}},
  booktitle={Proc. Interspeech 2009},