11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

HMM-Based Singing Voice Synthesis System Using Pitch-Shifted Pseudo Training Data

Ayami Mase, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

Nagoya Institute of Technology, Japan

A statistical parametric approach to singing voice synthesis based on hidden Markov models (HMMs) has been grown over the last few years. In this approach, spectrum, excitation, and duration of singing voices are simultaneously modeled by context-dependent HMMs, and waveforms are generated from HMMs themselves. However, pitches which hardly appear in training data cannot be generated properly because the system cannot model fundamental frequency (F0) contours of them. In this paper, we propose a technique for training HMMs using pitch-shifted pseudo data. Subjective listening test results show that the proposed technique improves the naturalness of the synthesized singing voices.

Full Paper

Bibliographic reference.  Mase, Ayami / Oura, Keiichiro / Nankaku, Yoshihiko / Tokuda, Keiichi (2010): "HMM-based singing voice synthesis system using pitch-shifted pseudo training data", In INTERSPEECH-2010, 845-848.