11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Conversational Spontaneous Speech Synthesis Using Average Voice Model

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

Tokyo Institute of Technology, Japan

This paper describes conversational spontaneous speech synthesis based on hidden Markov model (HMM). To reduce the amount of data required for model training, we utilize average-voice-based speech synthesis framework, which has been shown to be effective for synthesizing speech with arbitrary speaker's voice using a small amount of training data. We examine several kinds of average voice model using reading-style speech and/or conversational speech. We also examine an appropriate utterance unit for conversational speech synthesis. Experimental results show that the proposed two-stage model adaptation method improves the quality of synthetic conversational speech.

Full Paper

Bibliographic reference.  Koriyama, Tomoki / Nose, Takashi / Kobayashi, Takao (2010): "Conversational spontaneous speech synthesis using average voice model", In INTERSPEECH-2010, 853-856.