Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Placing Structuring Elements in a Word Sequence for Generating New Statistical Language Models

Karl Weilhammer (1), GŁnther Ruske (2)

(1) Department of Phonetics and Speech Communication, University of Munich, Germany
(2) Human-Machine-Communication, Technical University Munich, Germany

Class based n-gram language models have been applied successfully in speech technology. We will present an automatic method to improve n-gram language models by distributing structural elements in a new way in word sequences. Our algorithm works on textual data consisting of two different kinds of text elements, namely words and structural elements. The order of words will not be changed during the iterations. Only structural elements can be inserted or deleted by the algorithm between any two items in the data. Thus unseen n-grams will be interpolated by n-grams containing structural elements. We give a detailed description of the algorithm and present first results of a system trained on a small corpus.

Full Paper

Bibliographic reference.  Weilhammer, Karl / Ruske, GŁnther (2000): "Placing structuring elements in a word sequence for generating new statistical language models", In ICSLP-2000, vol.1, 210-213.