The majority of Chinese characters are monophonic, while a special group of characters, called polyphonic characters, have multiple pronunciations. As a prerequisite of performing speech-related generative tasks, the correct pronunciation must be identified among several candidates. This process is called Polyphone Disambiguation. Although the problem has been well explored with both knowledge-based and learning-based approaches, it remains challenging due to the lack of publicly available labeled datasets and the irregular nature of polyphone in Mandarin Chinese. In this paper, we propose a novel semi-supervised learning (SSL) framework for Mandarin Chinese polyphone disambiguation that can potentially leverage unlimited unlabeled text data. We explore the effect of various proxy labeling strategies including entropy-thresholding and lexicon-based labeling. Qualitative and quantitative experiments demonstrate that our method achieves state-of-the-art performance. In addition, we publish a novel dataset specifically for the polyphone disambiguation task to promote further researches.
Cite as: Shi, Y., Wang, C., Chen, Y., Wang, B. (2021) Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning. Proc. Interspeech 2021, 4109-4113, doi: 10.21437/Interspeech.2021-502
@inproceedings{shi21d_interspeech, author={Yi Shi and Congyi Wang and Yu Chen and Bin Wang}, title={{Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={4109--4113}, doi={10.21437/Interspeech.2021-502} }