This paper proposes a technique for speaker and language adaptive training for HMM-based polyglot speech synthesis. Language-specific context-dependencies in the system are captured using CAT with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by CMLLR-based transforms. This framework allows multi-speaker/multi-language adaptive training and synthesis to be performed. Experimental results show that the proposed technique achieves better synthesis performance than both speaker-adaptively trained language-dependent and language-independent models.
Bibliographic reference. Zen, Heiga (2010): "Speaker and language adaptive training for HMM-based polyglot speech synthesis", In INTERSPEECH-2010, 410-413.