Sixth European Conference on Speech Communication and Technology
This paper is concerned with language modeling (LM) for large vocabulary speech recognition in Mandarin Chinese. As the language characteristics of Chinese are quite unique, we investigate some novel techniques in language modeling. We also borrow some of techniques that have been applied to other languages. Experiments have been conducted on the Call Home Mandarin, HUB4, and HUB5 corpora obtained from the Linguistic Data Consortium (LDC). The training set consists of 9.8 hours of spontaneous speech and 100K words in text. The test set consists of 1.6 hours of spontaneous speech and 20K words in text. We have found that our results compare favorably to the results reported in the literature.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Leung, Roger H.Y. / Choy, Chi-Yan / Leung, Hong C. (1999): "Characteristics of Chinese language models for large vocabulary telephone speech", In EUROSPEECH'99, 1775-1778.