Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

An HMM-based Bilingual (Mandarin-English) TTS

Hui Liang (1), Yao Qian (2), Frank K. Soong (2)

(1) School of Information Security Engineering, Shanghai Jiaotong University, China
(2) Microsoft Research Asia, Beijing, China

We propose to build an HMM-based, Mandarin and English, bilingual TTS system. Starting with a simple baseline of two TTS systems built separately from Mandarin and English databases recorded by the same speaker, we construct a new, mixed-language TTS by designing language specific and independent questions to facilitate phone sharing across the two languages. With shared phones, the new system has a smaller footprint than the baseline system. The synthesis quality is either the same for non-mixed, Mandarin or English synthesis as the baseline or much better for mixed-language synthesis. The higher quality of mixed-language synthesis is confirmed by preference scores of 59.5% vs 40.5%, obtained in a subjective listening test. A preliminary Mandarin synthesis experiment was also performed by using the model parameters in the leaf nodes of English decision tree where Kullback-Leibler divergence is used to establish the nearest neighbor based mapping between leaf nodes in the decision trees of the two languages. A subjective transcription test shows a character accuracy of 93.9%.

Full Paper

