12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Morpheme Based Factored Language Models for German LVCSR

Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney

RWTH Aachen University, Germany

German is a highly inflectional language, where a large number of words can be generated from the same root. It makes a liberal use of compounding leading to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probability estimates. Therefore, the use of morphemes for language modeling is considered a better choice for Large Vocabulary Continuous Speech Recognition (LVCSR) than the full-words. Thereby, better lexical coverage and less LM perplexities are achieved. On the other side, the use of Factored Language Models (FLMs) is considered a successful approach that allows the integration of many information sources to get better LM probability estimates. In this paper, we try a combined methodology for language modeling where both morphological decomposition and factored language modeling are used in one model called morpheme based FLM. Finally, we obtain around 2.5% relative reduction in Word Error Rate (WER) with respect to a traditional full-words system.

Full Paper

Bibliographic reference.  Mousa, Amr El-Desoky / Shaik, M. Ali Basha / Schlüter, Ralf / Ney, Hermann (2011): "Morpheme based factored language models for German LVCSR", In INTERSPEECH-2011, 1445-1448.