8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Fast GMM-based Voice Conversion for Text-To-Speech Synthesis Systems

Taoufik En-Najjary (1), Olivier Rosec (1), Thierry Chonavel (2)

(1) FranceTelecom R&D, France
(2) ENST Bretagne, France

Voice conversion (VC) can be seen as a powerful technology for customizing Text-to-Speech (TTS) systems. This paper deals with the integration of a VC method based on Gaussian Mixture Model (GMM) in a TTS system. In this framework, an algorithm that enables complexity reduction of the VC processing is proposed. The main idea is to restrict the conversion function to the most representative components of the GMM for each frame and, if necessary, to store the component indices and their associated weights in the acoustic dictionary. This method is evaluated by comparison to a classical GMM-based transformation function. Tests show that both methods yield comparable results. Furthermore, additional experiments indicate that this new technique leads to a significant decrease of the computational load involved in the conversion process.

