INTERSPEECH 2004 - ICSLP
To construct a natural singing-voice synthesis system, it is important to adequately control acoustic features such as fundamental frequency (F0), spectrum shapes, and phoneme duration in the synthesis method. This paper reveals acoustic features affecting singing-voice perception by comparative analyzing singing- and speaking-voices, and then proposes a transforming method from speaking-voice into singing-voice using STRAIGHT. This method is composed of an F0 control model for generating F0 contours of singing-voices, a spectral sequence control model for modifying spectral shapes in speaking-voice, and a duration control model based on rhythm. Results showed that the proposed system could synthesize a natural singing-voice, whose sound quality is almost the same as that of real singing-voice.
Bibliographic reference. Saitou, Takeshi / Tsuji, Naoya / Unoki, Masashi / Akagi, Masato (2004): "Analysis of acoustic features affecting "singing-ness" and its application to singing-voice synthesis from speaking-voice", In INTERSPEECH-2004, 1925-1928.