8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Incremental and Iterative Monolingual Clustering Algorithms

Sergio Barrachina, Juan Miguel Vilar

Universidad Jaume I, Spain

To reduce speech recognition error rate we can use better statistical language models. These models can be improved by grouping words into word equivalence classes. Clustering algorithms can be used to automatically do this word grouping. We present an incremental clustering algorithm and two iterative clustering algorithms. Also, we compare them with previous algorithms. The experimental results show that the two iterative algorithms perform as well as previous ones. It should be pointed out that one of them, that uses the leaving one out technique, has the ability to automatically determine the optimum number of classes. These iterative algorithms are used by the incremental one. On the other hand, the proposed incremental algorithm achieves the best results of the compared algorithms, its behavior is the most regular with the variation of the number of classes and can automatically determine the optimum number of classes.

Full Paper

Bibliographic reference.  Barrachina, Sergio / Vilar, Juan Miguel (2003): "Incremental and iterative monolingual clustering algorithms", In EUROSPEECH-2003, 241-244.