15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Text-Independent Voice Conversion Using Speaker Model Alignment Method from Non-Parallel Speech

Peng Song, Yun Jin, Wenming Zheng, Li Zhao

Southeast University, China

In this paper, we propose a novel voice conversion method called speaker model alignment (SMA), which does not require parallel training speech. Firstly, the source and target speaker models, described by Gaussian mixture model (GMM), are trained, respectively. Then, the transformation function of spectral features is learned by aligning the components of source and target speaker models iteratively. Additionally, the transformation function is further combined with GMM, enabling the multiple local mappings, and a local consistent GMM (LCGMM) is also considered for model training to improve the conversion accuracy. Finally, we carry out experiments to evaluate the performance of the proposed method. Objective and subjective experimental results demonstrate that compared with the well-known INCA approach, the proposed method achieves lower spectral distortions and higher correlations, and obtains a significant improvement in perceptual quality and similarity.

Full Paper

Bibliographic reference.  Song, Peng / Jin, Yun / Zheng, Wenming / Zhao, Li (2014): "Text-independent voice conversion using speaker model alignment method from non-parallel speech", In INTERSPEECH-2014, 2308-2312.