We applied neural network language model (NNLM) on Chinese by training and testing it on 2011 GALE Mandarin evaluation task. Exploiting the fact that there are no word boundaries in written Chinese, we trained various NNLMs using either word, character, or both, including a word-character hybrid-input NNLM which accepts both word and character as input. Our best result showed up to 0.6% absolute (6.3% relative) Character Error Rate (CER) reduction compared to an un-pruned 4-gram standard language model and 0.2% absolute (2.6% relative) CER reduction compared to a word-based NNLM.
Bibliographic reference. Kang, Moonyoung / Ng, Tim / Nguyen, Long (2011): "Mandarin word-character hybrid-input neural network language model", In INTERSPEECH-2011, 625-628.