12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Combining Active and Semi-Supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis

Binbin Shen, Zhiyong Wu, Yongxin Wang, Lianhong Cai

Tsinghua University, China

Grapheme-to-phoneme conversion (G2P) is a crucial step for Mandarin text-to-speech (TTS) synthesis, where homograph disambiguation is the core issue. Several machine learning algorithms have been proposed to solve the issue by building models from well annotated training corpus. However, the preparation of such well annotated corpus is very laboring and time-consuming which requires lots of manual hand-label work to validate the proper pronunciations of the homographs. This work tries to cover this problem by introducing the active learning (AL) and semi-supervised learning (SSL) algorithms for the homograph disambiguation task using unlabeled data. Experiments show that the proposed framework can greatly reduce the cost of manual hand-label work while preserving the performance of the trained model.

Full Paper

Bibliographic reference.  Shen, Binbin / Wu, Zhiyong / Wang, Yongxin / Cai, Lianhong (2011): "Combining active and semi-supervised learning for homograph disambiguation in Mandarin text-to-speech synthesis", In INTERSPEECH-2011, 2165-2168.