8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Analysis of Acoustic Features Affecting "Singing-ness" and Its Application to Singing-Voice Synthesis from Speaking-Voice

Takeshi Saitou, Naoya Tsuji, Masashi Unoki, Masato Akagi

Japan Advanced Institute of Science and Technology, Japan

To construct a natural singing-voice synthesis system, it is important to adequately control acoustic features such as fundamental frequency (F0), spectrum shapes, and phoneme duration in the synthesis method. This paper reveals acoustic features affecting singing-voice perception by comparative analyzing singing- and speaking-voices, and then proposes a transforming method from speaking-voice into singing-voice using STRAIGHT. This method is composed of an F0 control model for generating F0 contours of singing-voices, a spectral sequence control model for modifying spectral shapes in speaking-voice, and a duration control model based on rhythm. Results showed that the proposed system could synthesize a natural singing-voice, whose sound quality is almost the same as that of real singing-voice.

Full Paper

Bibliographic reference.  Saitou, Takeshi / Tsuji, Naoya / Unoki, Masashi / Akagi, Masato (2004): "Analysis of acoustic features affecting "singing-ness" and its application to singing-voice synthesis from speaking-voice", In INTERSPEECH-2004, 1925-1928.