16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Reconstructing Voices Within the Multiple-Average-Voice-Model Framework

Pierre Lanchantin (1), Christophe Veaux (2), Mark J. F. Gales (1), Simon King (2), Junichi Yamagishi (2)

(1) University of Cambridge, UK
(2) University of Edinburgh, UK

Personalisation of voice output communication aids (VOCAs) allows to preserve the vocal identity of people suffering from speech disorders. This can be achieved by the adaptation of HMM-based speech synthesis systems using a small amount of adaptation data. When the voice has begun to deteriorate, reconstruction is still possible in the statistical domain by correcting the parameters of the models associated with the speech disorder. This can be done by substituting those with parameters from a donor's voice, at risk of losing part of the identity of the patient. Recently, the Multiple-Average-Voice-Model (Multiple AVM) framework has been proposed for speaker adaptation. Adaptation is performed via interpolation into a speaker eigenspace spanned by the mean vectors of speaker-adapted AVMs which can be tuned to the individual speaker. In this paper, we present the benefits of this framework for voice reconstruction: it requires only a very small amount of adaptation data, interpolation can be performed in a clean speech eigenspace and the resulting voice can be easily fine-tuned by acting on the interpolation weights. We illustrate our points with a subjective assessment of the reconstructed voice.

Full Paper

Bibliographic reference.  Lanchantin, Pierre / Veaux, Christophe / Gales, Mark J. F. / King, Simon / Yamagishi, Junichi (2015): "Reconstructing voices within the multiple-average-voice-model framework", In INTERSPEECH-2015, 2232-2236.