We propose to build an HMM-based, Mandarin and English, bilingual TTS system. Starting with a simple baseline of two TTS systems built separately from Mandarin and English databases recorded by the same speaker, we construct a new, mixed-language TTS by designing language specific and independent questions to facilitate phone sharing across the two languages. With shared phones, the new system has a smaller footprint than the baseline system. The synthesis quality is either the same for non-mixed, Mandarin or English synthesis as the baseline or much better for mixed-language synthesis. The higher quality of mixed-language synthesis is confirmed by preference scores of 59.5% vs 40.5%, obtained in a subjective listening test. A preliminary Mandarin synthesis experiment was also performed by using the model parameters in the leaf nodes of English decision tree where Kullback-Leibler divergence is used to establish the nearest neighbor based mapping between leaf nodes in the decision trees of the two languages. A subjective transcription test shows a character accuracy of 93.9%.
Cite as: Liang, H., Qian, Y., Soong, F.K. (2007) An HMM-based bilingual (Mandarin-English) TTS. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 137-142
@inproceedings{liang07_ssw, author={Hui Liang and Yao Qian and Frank K. Soong}, title={{An HMM-based bilingual (Mandarin-English) TTS}}, year=2007, booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)}, pages={137--142} }