Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-Level Embedding Features

Zexin Cai, Yaogen Yang, Chuxiong Zhang, Xiaoyi Qin, Ming Li


This paper describes a conditional neural network architecture for Mandarin Chinese polyphone disambiguation. The system is composed of a bidirectional recurrent neural network component acting as a sentence encoder to accumulate the context correlations, followed by a prediction network that maps the polyphonic character embeddings along with the conditions to corresponding pronunciations. We obtain the word-level condition from a pre-trained word-to-vector lookup table. One goal of polyphone disambiguation is to address the homograph problem existing in the front-end processing of Mandarin Chinese text-to-speech system. Our system achieves an accuracy of 94.69% on a publicly available polyphonic character dataset. To further validate our choices on the conditional feature, we investigate polyphone disambiguation systems with multi-level conditions respectively. The experimental results show that both the sentence-level and the word-level conditional embedding features are able to attain good performance for Mandarin Chinese polyphone disambiguation.


 DOI: 10.21437/Interspeech.2019-1235

Cite as: Cai, Z., Yang, Y., Zhang, C., Qin, X., Li, M. (2019) Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-Level Embedding Features. Proc. Interspeech 2019, 2110-2114, DOI: 10.21437/Interspeech.2019-1235.


@inproceedings{Cai2019,
  author={Zexin Cai and Yaogen Yang and Chuxiong Zhang and Xiaoyi Qin and Ming Li},
  title={{Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-Level Embedding Features}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2110--2114},
  doi={10.21437/Interspeech.2019-1235},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1235}
}