A vocal conversion system that can synthesize a singing voice given a speaking voice and a musical score is proposed. It is based on the speech manipulation system STRAIGHT , and comprises three models controlling three acoustic features unique to singing voices: the F0, duration, and spectral envelope. Given the musical score and its tempo, the F0 control model generates the F0 contour of the singing voice by controlling four F0 fluctuations: overshoot, vibrato, preparation, and fine fluctuation. The duration control model lengthens the duration of each phoneme in the speaking voice by considering the duration of its musical note. The spectral control model converts the spectral envelope of the speaking voice into that of the singing voice by controlling both the singing formant and the amplitude modulation of formants in synchronization with vibrato. Experimental results showed that the proposed system could convert speaking voices into singing voices whose quality resembles that of actual singing voices.
|input_speaking_male.wav||This is a male voice that is reading the lyrics of a Japanese children's song "Nanatsunoko".|
|input_speaking_female.wav||This is a female voice that is reading the lyrics of a Japanese children's song "Nanatsunoko".|
|synthesized_singing_male.wav||This is a male synthesized singing voice converted from input_speaking_male.wav by using our proposed system.|
|synthesized_singing_female.wav||This is a female synthesized singing voice converted from input_speaking_female.wav by using our proposed system.|
Bibliographic reference. Saitou, Takeshi / Goto, Masataka / Unoki, Masashi / Akagi, Masato (2007): "Vocal conversion from speaking voice to singing voice using STRAIGHT", In INTERSPEECH-2007, 4005-4006.