8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction

Pi-Chuan Chang, Shuo-Peng Liao, Lin-shan Lee

National Taiwan University, Taiwan

In this paper an iterative Chinese new phrase extraction method based on the intra-phrase association and context variation statistics is proposed. A Chinese language model enhancement framework including lexicon expansion is then developed. Extensive experiments for Chinese broadcast news transcription were then performed to explore the achievable improvements with respect to the degree of temporal consistency for the adaptation corpora. Very encouraging results were obtained and detailed analysis discussed.

Full Paper

Bibliographic reference.  Chang, Pi-Chuan / Liao, Shuo-Peng / Lee, Lin-shan (2003): "Improved Chinese broadcast news transcription by language modeling with temporally consistent training corpora and iterative phrase extraction", In EUROSPEECH-2003, 421-424.