12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Compound Word Recombination for German LVCSR

Markus Nußbaum-Thom, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney

RWTH Aachen University, Germany

Compound words are a difficulty for German speech recognition systems since they cause high out-of-vocabulary and word error rates. State of the art approaches augment the language model by the fragments of compounds in order to increase lexical coverage, lower the perplexity and out-of-vocabulary rate. The fragments are tagged in order to concatenate subsequent equally tagged fragments in the recognition result, but this does not guarantee the recombination of proper words. Such recombination techniques neglect the large vocabulary of the language model training data for recombination although most compounds are covered by it. In this paper, we investigate the use of this vocabulary for the recombination of compound words from the recognition result. The approach is tested on two large vocabulary tasks on top of full-word and fragment based language models and achieves good improvements of 3.7% relative over the baseline compound-sensitive word error rate.

Full Paper

Bibliographic reference.  Nußbaum-Thom, Markus / Mousa, Amr El-Desoky / Schlüter, Ralf / Ney, Hermann (2011): "Compound word recombination for German LVCSR", In INTERSPEECH-2011, 1449-1452.