ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Chinese input method based on reduced Mandarin phonetic alphabet

Chun-Han Tseng, Chia-Ping Chen

In this paper we study the problem of simplifying Chinese input method and making it suitable for use with mobile devices. To see the feasibility of aggressively reducing the number of keystrokes per Chinese character, we compare three input modes: character-based, syllable-based and first-symbol-based. Specifically, we use these linguistic units as token types and compare the perplexities. With the language model trained by data based on the ASBC corpus, the perplexity of the data set we collect from on-line chat and instant messages is 102.6 for character-based model, 67.7 for syllable-based model and 16.3 for first-symbol-based model. Arguing from the relation between the perplexity and the number of "typical" sentences of a language model, our conclusion is that on average there are 6 to 7 characters per first-symbol in natural Chinese language.


doi: 10.21437/Interspeech.2006-252

Cite as: Tseng, C.-H., Chen, C.-P. (2006) Chinese input method based on reduced Mandarin phonetic alphabet. Proc. Interspeech 2006, paper 1944-Mon3FoP.11, doi: 10.21437/Interspeech.2006-252

@inproceedings{tseng06_interspeech,
  author={Chun-Han Tseng and Chia-Ping Chen},
  title={{Chinese input method based on reduced Mandarin phonetic alphabet}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1944-Mon3FoP.11},
  doi={10.21437/Interspeech.2006-252}
}