Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Reduced Gaussian Mixture Models in a Large Vocabulary Continuous Speech Recognizer

V. Fischer, T. Ross

IBM Speech Systems, European Speech Research, Heidelberg, Germany

Large vocabulary continuous speech recognition (LVCSR) systems usually employ several tens of thousands of gaussian mixture components for an accurate statistical representation of naturally spoken human speech. For applications that cannot effort the computational expensive evaluation of numerous Gaussians during recognition time, it is an important question whether the number of Gaussians can be significantly reduced without a large degradation in recognition accuracy. In this paper we introduce two new methods for the pruning of Gaussians in a continuous density HMM based speech recognizer that address either the contribution of a Gaussian to the observation likelihood of a HMM state or the reliability of parameter estimation during acoustic model training. Experimental results show that we can reduce the number of mixture components by more than 33 percent, whereas the speaker independent word error rate shows a relative increase of only 2 percent.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Fischer, V. / Ross, T. (1999): "Reduced gaussian mixture models in a large vocabulary continuous speech recognizer", In EUROSPEECH'99, 1099-1102.