ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches

Martha Larson, Daniel Willett, Joachim Köhler, Gerhard Rigoll

This paper proposes a novel combined compound splitting and phrase recombination method that optimizes the composition of the speech recognition lexicon for a given domain. Data-driven compound word splitting is followed by iterative recombination of high frequency combinations. Language model perplexity and size are the criteria used to identify a balance between compound decomposition, which reduces OOV, and lexical unit recombination, which packs additional context into a fixed-size vocabulary. The method provides a basis for lexicon design for a LVCSR system on the domain of German parliamentary speeches that is to be used as the foundation of a spoken document information retrieval system. The approach achieves a 35% reduction in OOV without a prohibitively large sacrifice in recognition performance.


doi: 10.21437/ICSLP.2000-690

Cite as: Larson, M., Willett, D., Köhler, J., Rigoll, G. (2000) Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 945-948, doi: 10.21437/ICSLP.2000-690

@inproceedings{larson00_icslp,
  author={Martha Larson and Daniel Willett and Joachim Köhler and Gerhard Rigoll},
  title={{Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 945-948},
  doi={10.21437/ICSLP.2000-690}
}