Compound words are a difficulty for German speech recognition systems since they cause high out-of-vocabulary and word error rates. State of the art approaches augment the language model by the fragments of compounds in order to increase lexical coverage, lower the perplexity and out-of-vocabulary rate. The fragments are tagged in order to concatenate subsequent equally tagged fragments in the recognition result, but this does not guarantee the recombination of proper words. Such recombination techniques neglect the large vocabulary of the language model training data for recombination although most compounds are covered by it. In this paper, we investigate the use of this vocabulary for the recombination of compound words from the recognition result. The approach is tested on two large vocabulary tasks on top of full-word and fragment based language models and achieves good improvements of 3.7% relative over the baseline compound-sensitive word error rate.
Bibliographic reference. Nußbaum-Thom, Markus / Mousa, Amr El-Desoky / Schlüter, Ralf / Ney, Hermann (2011): "Compound word recombination for German LVCSR", In INTERSPEECH-2011, 1449-1452.