Grapheme-to-phoneme conversion (G2P) is a crucial step for Mandarin text-to-speech (TTS) synthesis, where homograph disambiguation is the core issue. Several machine learning algorithms have been proposed to solve the issue by building models from well annotated training corpus. However, the preparation of such well annotated corpus is very laboring and time-consuming which requires lots of manual hand-label work to validate the proper pronunciations of the homographs. This work tries to cover this problem by introducing the active learning (AL) and semi-supervised learning (SSL) algorithms for the homograph disambiguation task using unlabeled data. Experiments show that the proposed framework can greatly reduce the cost of manual hand-label work while preserving the performance of the trained model.
Bibliographic reference. Shen, Binbin / Wu, Zhiyong / Wang, Yongxin / Cai, Lianhong (2011): "Combining active and semi-supervised learning for homograph disambiguation in Mandarin text-to-speech synthesis", In INTERSPEECH-2011, 2165-2168.