Sixth European Conference on Speech Communication and Technology

The vast majority of HMM–based speech recognition systems use Gaussian mixture models as the state distribution model. The use of these distributions is motivated more by ease of training, decoding and the fact that a sufficient number of Gaussian components may be used to approximate any distribution, than some underlying aspect of the data being modelled. If distributions were selected that better modelled the observed data, fewer components should be required and recognition accuracy should improve. This paper examines two distributions for improving the modelling of the tails of the densities. The first distribution, the Richter distribution, fits within the general framework of Gaussian component tying, but has some attractive attributes for decoding. The second distribution, the power exponential, does not fit within a tying framework. Despite gains in likelihood, indicating that the Gaussian components are sub–optimal in a likelihood sense, only small gains in recognition performance were observed on a large vocabulary speech recognition task.
Full Paper (PDF) GnuZipped Postscript
Bibliographic reference. Gales, M. J. F. / Olsen, P. A. (1999): "Tail distribution modelling using the richter and power exponential distributions", In EUROSPEECH'99, 15071510.