8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Compound Decomposition in Dutch Large Vocabulary Speech Recognition

Roeland Ordelman, Arjan van Hessen, Franciska de Jong

University of Twente, The Netherlands

This paper addresses compound splitting for Dutch in the context of broadcast news transcription. Language models were created using original text versions and text versions that were decomposed using a data-driven compound splitting algorithm. Language model performances were compared in terms of out-of- vocabulary rates and word error rates in a real-world broadcast news transcription task. It was concluded that compound splitting does improve ASR performance. Best results were obtained when frequent compounds were not decomposed.

Full Paper

Bibliographic reference.  Ordelman, Roeland / Hessen, Arjan van / Jong, Franciska de (2003): "Compound decomposition in dutch large vocabulary speech recognition", In EUROSPEECH-2003, 225-228.