As typical voice conversion methods, two spectral conversion processes have been proposed: 1) the frame-based conversion that converts spectral parameters frame by frame and 2) the trajectory-based conversion that converts all spectral parameters over an utterance simultaneously. The former process is capable of real-time conversion but it sometimes causes inappropriate spectral movements. On the other hand, the latter process provides the converted spectral parameters exhibiting proper dynamic characteristics but a batch process is inevitable. To achieve the real-time conversion process considering spectral dynamic characteristics, we propose a time-recursive conversion algorithm based on maximum likelihood estimation of spectral parameter trajectory. Experimental results show that the proposed method achieves the low-delay conversion process, e.g., only one frame delay, while keeping the conversion performance comparably high to that of the conventional trajectory-based conversion.
Bibliographic reference. Muramatsu, Takashi / Ohtani, Yamato / Toda, Tomoki / Saruwatari, Hiroshi / Shikano, Kiyohiro (2008): "Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory", In INTERSPEECH-2008, 1076-1079.