7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

A Multi-Class Approach for Modelling out-of-Vocabulary Words

Issam Bazzi, James Glass

MIT Laboratory for Computer Science, USA

In this paper we present a multi-class extension to our approach for modelling out-of-vocabulary (OOV) words [1]. Instead of augmenting the word search space with a single OOV model, we add several OOV models, one for each class of words. We present two approaches for designing the OOV word classes. The first approach relies on using common part-of-speech tags. The second approach is a datadriven two-step clustering procedure, where the first step uses agglomerative clustering to derive an initial class assignment, while the second step uses iterative clustering to move words from one class to another in order to reduce the model perplexity. We present experiments within the JUPITER weather information domain. Results show that the multi-class model significantly improves performance over using a single OOV class. For an OOV detection rate of 70%, the false alarm rate is reduced from 5.3% for a single class to 2.9% for an eight-class model.


Full Paper

Bibliographic reference.  Bazzi, Issam / Glass, James (2002): "A multi-class approach for modelling out-of-vocabulary words", In ICSLP-2002, 1613-1616.