ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Vocal conversion from speaking voice to singing voice using STRAIGHT

Takeshi Saitou, Masataka Goto, Masashi Unoki, Masato Akagi

A vocal conversion system that can synthesize a singing voice given a speaking voice and a musical score is proposed. It is based on the speech manipulation system STRAIGHT [1], and comprises three models controlling three acoustic features unique to singing voices: the F0, duration, and spectral envelope. Given the musical score and its tempo, the F0 control model generates the F0 contour of the singing voice by controlling four F0 fluctuations: overshoot, vibrato, preparation, and fine fluctuation. The duration control model lengthens the duration of each phoneme in the speaking voice by considering the duration of its musical note. The spectral control model converts the spectral envelope of the speaking voice into that of the singing voice by controlling both the singing formant and the amplitude modulation of formants in synchronization with vibrato. Experimental results showed that the proposed system could convert speaking voices into singing voices whose quality resembles that of actual singing voices.


Cite as: Saitou, T., Goto, M., Unoki, M., Akagi, M. (2007) Vocal conversion from speaking voice to singing voice using STRAIGHT. Proc. Interspeech 2007, 4005-4006

@inproceedings{saitou07_interspeech,
  author={Takeshi Saitou and Masataka Goto and Masashi Unoki and Masato Akagi},
  title={{Vocal conversion from speaking voice to singing voice using STRAIGHT}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={4005--4006}
}