7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

State Clustering Improvements for Continuous HMMs in a Spanish Large Vocabulary Recognition System

R. Córdoba, J. Macías-Guarasa, J. Ferreiros, J. M. Montero, José M. Pardo

Universidad Politécnica de Madrid, Spain

In this paper we present a whole set of improvements that have been applied to a large vocabulary isolated-word recognition system using continuous models. This system has been used in the EU funded IDAS project (LE4-8315), where an automated interactive telephonebased directory assistance service has been developed. We cover both improvements in the techniques for continuous HMM reestimation and agglomerative clustering for context-dependent models, all of them applied to our database in Spanish. Specifically, we will show how a new distance between states can greatly improve the performance of the clustering process. We show a new strategy for the clustering itself based in multiple Gaussian clustering which improved the results too. And finally, we present a new way to find the optimum number of Gaussians for each state that can be applied to both context dependent and context independent models.

Full Paper

Bibliographic reference.  Córdoba, R. / Macías-Guarasa, J. / Ferreiros, J. / Montero, J. M. / Pardo, José M. (2002): "State clustering improvements for continuous HMMs in a Spanish large vocabulary recognition system", In ICSLP-2002, 677-680.