8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


The 300k LIMSI German Broadcast News Transcription System

Kevin McTait, Martine Adda-Decker


This paper describes improvements to the existing LIMSI German broadcast news transcription system, especially its extension from a 65k vocabulary to 300k words. Automatic speech recognition for German is more problematic than for a language such as English in that the inflectional morphology of German and its highly generative process of compounding lead to many more out of vocabulary words for a given vocabulary size. Experiments undertaken to tackle this problem and reduce the transcription error rate include bringing the language models up to date, improved pronunciation models, semi-automatically constructed pronunciation lexicons and increasing the size of the system's vocabulary.

Full Paper

Bibliographic reference.  McTait, Kevin / Adda-Decker, Martine (2003): "The 300k LIMSI German broadcast news transcription system", In EUROSPEECH-2003, 213-216.