Most of the existing voice conversion methods calculate the optimal transformation function from a given set of paired acoustic vectors of the source and target speakers. The alignment of the phonetically equivalent source and target frames is problematic when the training corpus available is not parallel, although this is the most realistic situation. The alignment task is even more difficult in cross-lingual applications because the phoneme sets may be different in the involved languages. In this paper, a new iterative alignment method based on acoustic distances is proposed. The method is shown to be suitable for text-independent and cross-lingual voice conversion, and the conversion scores obtained in our evaluation experiments are not far from the performance achieved by using parallel training corpora.
Cite as: Erro, D., Moreno, A. (2007) Frame alignment method for cross-lingual voice conversion. Proc. Interspeech 2007, 1969-1972, doi: 10.21437/Interspeech.2007-551
@inproceedings{erro07b_interspeech, author={Daniel Erro and Asunción Moreno}, title={{Frame alignment method for cross-lingual voice conversion}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={1969--1972}, doi={10.21437/Interspeech.2007-551} }