8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Frame Alignment Method for Cross-Lingual Voice Conversion

Daniel Erro, Asunción Moreno

Universitat Politècnica de Catalunya, Spain

Most of the existing voice conversion methods calculate the optimal transformation function from a given set of paired acoustic vectors of the source and target speakers. The alignment of the phonetically equivalent source and target frames is problematic when the training corpus available is not parallel, although this is the most realistic situation. The alignment task is even more difficult in cross-lingual applications because the phoneme sets may be different in the involved languages. In this paper, a new iterative alignment method based on acoustic distances is proposed. The method is shown to be suitable for text-independent and cross-lingual voice conversion, and the conversion scores obtained in our evaluation experiments are not far from the performance achieved by using parallel training corpora.

