INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Voice Expression Conversion with Factorised HMM-TTS Models

Javier Latorre, Vincent Wan, Kayoko Yanagisawa

Toshiba Research Europe, UK

This paper proposes a method to modify the expression or emotion in a sample of speech without altering the speaker's identity. The method exploits a statistical speech model that factorises the speaker identity from expressions using linear transforms. For this approach, the set of transforms that best fit the speaker and expression of the input speech sample are learned. They are then combined with the expression transforms of the desired expression taken from another speaker. Since the combined expression transform is factorised and contains information about expression only, it may be applied to the original speech sample to modify its expression to the desired one without altering the identity of the speaker. Notably, this method may be applied universally to any voice without the need for a parallel training corpus.

Full Paper

Bibliographic reference.  Latorre, Javier / Wan, Vincent / Yanagisawa, Kayoko (2014): "Voice expression conversion with factorised HMM-TTS models", In INTERSPEECH-2014, 1514-1518.