EUROSPEECH 2003 - INTERSPEECH 2003
This paper describes improvements to the existing LIMSI German broadcast news transcription system, especially its extension from a 65k vocabulary to 300k words. Automatic speech recognition for German is more problematic than for a language such as English in that the inflectional morphology of German and its highly generative process of compounding lead to many more out of vocabulary words for a given vocabulary size. Experiments undertaken to tackle this problem and reduce the transcription error rate include bringing the language models up to date, improved pronunciation models, semi-automatically constructed pronunciation lexicons and increasing the size of the system's vocabulary.
Bibliographic reference. McTait, Kevin / Adda-Decker, Martine (2003): "The 300k LIMSI German broadcast news transcription system", In EUROSPEECH-2003, 213-216.