The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

Voice Conversion using Precise Speech Alignment based on Spectral Property and Eigen-Codeword Distribution

Yi-Chin Huang, Chung-Hsien Wu, Chung-Han Lee, Yu-Ting Chao

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan

While voice conversion methods have been popularly applied to convert the speech signals uttered by a source speaker to a target speaker, frame-based voice conversion generally suffers from incorrect alignment using only spectral distance and therefore generate improper conversion results. In a parallel phone sequence, the alignment using minimum spectral distance between frame-based feature vectors of the source and target phone sequences is theoretical impractical, since the spectral properties of the source and target phones are inherently different. Nevertheless, if the feature vectors of the phone sequence are transformed into codewords in an eigen space, the eigen-codeword occurrence distribution curves of the source and target phone sequences are likely to be similar. By integrating the codeword occurrence distribution into distance estimation, a more precise frame alignment based on dynamic time warping can be obtained. With the precise alignment, voice conversion functions can be properly constructed. Objective and subjective evaluations were conducted and the comparison results to spectral distancebased alignment confirm the improved performance of the proposed method.

Index Terms: Voice conversion, eigen vector, phone alignment

Full Paper

Bibliographic reference.  Huang, Yi-Chin / Wu, Chung-Hsien / Lee, Chung-Han / Chao, Yu-Ting (2010): "Voice conversion using precise speech alignment based on spectral property and eigen-codeword distribution", In SSW7-2010, 62-67.