8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

An Effective Initial/Final Duration Prediction Method for Corpus-Based Singing Voice Synthesis of Mandarin Chinese

Cheng-Yuan Lin, Pei-Chi Jao, J. -S. Roger Jang

National Tsing Hua University, Taiwan

In this paper, we propose an effective method for predicting initial/final duration for corpus-based singing voice synthesis of Mandarin Chinese. The goal of the method is to improve the naturalness and clarity of the synthesized singing voices. To achieve this goal, we construct an individual initial/final (I/F) duration prediction model for each category of consonants. Support vector machine is used for duration prediction in each model. In order to achieve better accuracy, we use both linguistic/phonetic attributes and music-score information as the input features for the I/F duration prediction model. Experimental results demonstrate that the proposed method is effective in predicting the I/F duration for singing voice synthesis.

Full Paper

Bibliographic reference.  Lin, Cheng-Yuan / Jao, Pei-Chi / Jang, J. -S. Roger (2007): "An effective initial/final duration prediction method for corpus-based singing voice synthesis of Mandarin Chinese", In INTERSPEECH-2007, 470-473.