5th International Conference on Spoken Language Processing
In this paper we show that accurate HMMs for connected word recognition can be obtained without context dependent modeling and discriminative training. To account for different speaking rates, we define two HMMs for each word that must be trained. The two models have the same, standard, left to right topology with the possibility of skipping one state, but each model has a different number of states, automatically selected. Our simple modeling and training technique has been applied to connected digit recognition using the adult speaker portion of the TI/NIST corpus. The obtained results are comparable with the best ones reported in the literature for models with a larger number of densities.
Bibliographic reference. Chesta, C. / Laface, Pietro / Ravera, F. (1998): "HMM topology selection for accurate acoustic and duration modeling", In ICSLP-1998, paper 0149.