 |
Sixth International Conference on Spoken Language Processing
(ICSLP 2000)
Beijing, China
October 16-20, 2000 |
 |
Broadcast News Transcription in Mandarin
Langzhou Chen, Lori Lamel, Gilles Adda, Jean-Luc Gauvain
Spoken Language Processing Group,
LIMSI-CNRS, Orsay, France
In this paper, our work in developing a Mandarin broadcast
news transcription system is described. The main focus of this
work is a port of the LIMSI American English broadcast news
transcription system to the Chinese Mandarin language. The system
consists of an audio partitioner and an HMM-based continuous
speech recognizer. The acoustic models were trained on
about 24 hours of data from the 1997 Hub4 Mandarin corpus
available via LDC. In addition to the transcripts, the language
models were trained on Mandarin Chinese News Corpus containing
about 186 million characters. We investigate recognition performance
as a function of lexical size, with and without tone in
the lexicon, and with a topic dependent language model. The
transcription character error rate on the DARPA 1997 test set is
18.1% using a lexicon with 3 tone levels and a topic-based language
model.
Full Paper
Bibliographic reference.
Chen, Langzhou / Lamel, Lori / Adda, Gilles / Gauvain, Jean-Luc (2000):
"Broadcast news transcription in Mandarin",
In ICSLP-2000, vol.2, 1015-1018.