The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

HMM-Based Robust Voice Conversion Using Adaptive F0 Quantization

Takashi Nose, Takao Kobayashi

Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama, Japan

This paper proposes an HMM-based voice conversion (VC) technique with quantized F0 symbol context using adaptive F0 quantization. In the HMM-based VC, an input utterance of a source speaker is decoded into phonetic and prosodic symbol sequences, and the converted speech is generated using the decoded information from the pre-trained target speaker’s phonetically and prosodically context-dependent HMM. In our previous work, we generated the F0 symbol by quantizing the average log F0 value of each phone using global mean and variance parameters calculated from the training data. In this study, these statistical parameters are obtained sentence-by-sentence, and this adaptive approach enables the more robust F0 conversion than the conventional our technique. Objective and subjective experimental results for English and Japanese speech show that the proposed adaptive quantization technique gives better F0 conversion performance than the conventional one. Moreover, the HMM-based VC is significantly robust for the variation of the source speaker’s individuality compared to the GMM-based one.

Full Paper

Bibliographic reference.  Nose, Takashi / Kobayashi, Takao (2010): "HMM-based robust voice conversion using adaptive F0 quantization", In SSW7-2010, 80-85.