ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

A style control technique for singing voice synthesis based on multiple-regression HSMM

Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi

This paper proposes a technique for controlling singing style in the HMM-based singing voice synthesis. A style control technique based on multiple regression HSMM (MRHSMM), which was originally proposed for the HMM-based expressive speech synthesis, is applied to the conventional technique. The idea of pitch adaptive training is introduced into the MRHSMM to improve the modeling accuracy of fundamental frequency (F0) associated with notes. A robust vibrato modeling technique based on a moving average filter is also proposed to reproduce a natural-sounding vibrato expression even when the vibrato expression of the original singing voice is unclear. Subjective evaluation results show that users can intuitively control a singing style while keeping naturalness of the synthetic voice.


doi: 10.21437/Interspeech.2013-104

Cite as: Nose, T., Kanemoto, M., Koriyama, T., Kobayashi, T. (2013) A style control technique for singing voice synthesis based on multiple-regression HSMM. Proc. Interspeech 2013, 378-382, doi: 10.21437/Interspeech.2013-104

@inproceedings{nose13_interspeech,
  author={Takashi Nose and Misa Kanemoto and Tomoki Koriyama and Takao Kobayashi},
  title={{A style control technique for singing voice synthesis based on multiple-regression HSMM}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={378--382},
  doi={10.21437/Interspeech.2013-104}
}