8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Large Corpus Experiments for Broadcast News Recognition

Patrick Nguyen, Luca Rigazio, Jean-Claude Junqua

Panasonic Speech Technology Laboratory, USA

This paper investigates the use of a large corpus for the training of a Broadcast News speech recognizer. A vast body of speech recognition algorithms and mathematical machinery is aimed at smoothing estimates toward accurate modeling with scant amounts of data. In most cases, this research is motivated by a real need for more data. In Broadcast News, however, a large corpus is already available to all LDC members. Until recently, it has not been considered for acoustic training.

We would like to pioneer the use of the largest speech corpus (1200h) available for the purpose of acoustic training of speech recognition systems. To the best of our knowledge it is the largest scale acoustic training ever considered in speech recognition. We obtain a performance improvement of 1.5% absolute WER over our best standard (200h) training.

Full Paper

Bibliographic reference.  Nguyen, Patrick / Rigazio, Luca / Junqua, Jean-Claude (2003): "Large corpus experiments for broadcast news recognition", In EUROSPEECH-2003, 1837-1840.