9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Low-Delay Voice Conversion Based on Maximum Likelihood Estimation of Spectral Parameter Trajectory

Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

NAIST, Japan

As typical voice conversion methods, two spectral conversion processes have been proposed: 1) the frame-based conversion that converts spectral parameters frame by frame and 2) the trajectory-based conversion that converts all spectral parameters over an utterance simultaneously. The former process is capable of real-time conversion but it sometimes causes inappropriate spectral movements. On the other hand, the latter process provides the converted spectral parameters exhibiting proper dynamic characteristics but a batch process is inevitable. To achieve the real-time conversion process considering spectral dynamic characteristics, we propose a time-recursive conversion algorithm based on maximum likelihood estimation of spectral parameter trajectory. Experimental results show that the proposed method achieves the low-delay conversion process, e.g., only one frame delay, while keeping the conversion performance comparably high to that of the conventional trajectory-based conversion.

Full Paper

Bibliographic reference.  Muramatsu, Takashi / Ohtani, Yamato / Toda, Tomoki / Saruwatari, Hiroshi / Shikano, Kiyohiro (2008): "Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory", In INTERSPEECH-2008, 1076-1079.