A statistical parametric approach to singing voice synthesis based on hidden Markov models (HMMs) has been grown over the last few years. In this approach, spectrum, excitation, and duration of singing voices are simultaneously modeled by context-dependent HMMs, and waveforms are generated from HMMs themselves. However, pitches which hardly appear in training data cannot be generated properly because the system cannot model fundamental frequency (F0) contours of them. In this paper, we propose a technique for training HMMs using pitch-shifted pseudo data. Subjective listening test results show that the proposed technique improves the naturalness of the synthesized singing voices.
Bibliographic reference. Mase, Ayami / Oura, Keiichiro / Nankaku, Yoshihiko / Tokuda, Keiichi (2010): "HMM-based singing voice synthesis system using pitch-shifted pseudo training data", In INTERSPEECH-2010, 845-848.