5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Reducing the OOV Rate in Broadcast News Speech Recognition

Thomas Kemp, Alex Waibel

ISL, University of Karlsruhe, Germany

To achieve the long-term goal of robust, real-time broadcast news transcription, several problems have to be overcome, e.g. the variety of acoustic conditions and the unlimited vocabulary. In this paper we address the problem of unlimited vocabulary. We show, that this problem is more serious for German than it is for English. Using a speech recognition system with a large vocabulary, we dynamically adapt the active vocabulary to the topic of the current news segment. This is done by using information retrieval (IR) techniques on a large collection of texts automatically gathered from the internet. The same technique is also used to adapt the language model of the recognition system. The process of vocabulary adaptation and language model retraining is completely unsupervised. We show, that dynamic vocabulary adaptation can significantly reduce the out-of-vocabulary (OOV) rate and the word error rate of our broadcast news transcription system View4You.

Full Paper

Bibliographic reference.  Kemp, Thomas / Waibel, Alex (1998): "Reducing the OOV rate in broadcast news speech recognition", In ICSLP-1998, paper 0757.