Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Characteristics of Chinese Language Models for Large Vocabulary Telephone Speech

Roger H.Y. Leung, Chi-Yan Choy, Hong C. Leung

Department of Electronic Engineering, The Chinese University of Hong Kong

This paper is concerned with language modeling (LM) for large vocabulary speech recognition in Mandarin Chinese. As the language characteristics of Chinese are quite unique, we investigate some novel techniques in language modeling. We also borrow some of techniques that have been applied to other languages. Experiments have been conducted on the Call Home Mandarin, HUB4, and HUB5 corpora obtained from the Linguistic Data Consortium (LDC). The training set consists of 9.8 hours of spontaneous speech and 100K words in text. The test set consists of 1.6 hours of spontaneous speech and 20K words in text. We have found that our results compare favorably to the results reported in the literature.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Leung, Roger H.Y. / Choy, Chi-Yan / Leung, Hong C. (1999): "Characteristics of Chinese language models for large vocabulary telephone speech", In EUROSPEECH'99, 1775-1778.