12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Mandarin Word-Character Hybrid-Input Neural Network Language Model

Moonyoung Kang, Tim Ng, Long Nguyen

Raytheon BBN Technologies, USA

We applied neural network language model (NNLM) on Chinese by training and testing it on 2011 GALE Mandarin evaluation task. Exploiting the fact that there are no word boundaries in written Chinese, we trained various NNLMs using either word, character, or both, including a word-character hybrid-input NNLM which accepts both word and character as input. Our best result showed up to 0.6% absolute (6.3% relative) Character Error Rate (CER) reduction compared to an un-pruned 4-gram standard language model and 0.2% absolute (2.6% relative) CER reduction compared to a word-based NNLM.

Full Paper

Bibliographic reference.  Kang, Moonyoung / Ng, Tim / Nguyen, Long (2011): "Mandarin word-character hybrid-input neural network language model", In INTERSPEECH-2011, 625-628.