8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Domain Adaptation Methods in the IBM trainable Text-to-speech System

Volker Fischer, Jaime Botella Ordinas, Siegfried Kunzmann

IBM Pervasive Computing, Germany

This paper presents a comparison of domain adaptation techniques for a unit selection based text-to-speech system. The methods under investigation consider two different prerequisites, namely the absence and the existence of additional domain specific training prompts, spoken by the original voice talent. Whereas in the first case we employ domain specific pre-selection, for the latter we compare a variety of methods that range from a simple extension of the segment inventory to a complete reconstruction of the system, which also includes the training of decision trees for the domain dependent prediction of prosody targets. An experimental evaluation of the methods under consideration unveils significant improvements (up to 1.1 on a 5 point MOS scale) over the baseline system for sentences from the target domain, while showing no significant degradation when synthesizing sentences from other than the adaptation domain.

Full Paper

Bibliographic reference.  Fischer, Volker / Ordinas, Jaime Botella / Kunzmann, Siegfried (2004): "Domain adaptation methods in the IBM trainable text-to-speech system", In INTERSPEECH-2004, 1165-1168.