8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Maximum Entropy Direct Model as a Unified Model for Acoustic Modeling in Speech Recognition

Hong-Kwang Jeff Kuo, Yuqing Gao

IBM T.J. Watson Research Center, USA

Traditional statistical models for speech recognition have been dominated by generative models such as Hidden Markov Models (HMMs). We recently proposed a new framework for speech recognition using maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMMs, features can be non-independent, asynchronous, and overlapping. In this paper, we discuss how to make the computationally intensive training of such models feasible through parallelizing the IIS (Improved Iterative Scaling) algorithm. The direct model significantly outperforms traditional HMMs in word error rate when used as stand-alone acoustic models. Modest improvements over the best HMM system are seen when combined with HMM and language model scores. The maximum entropy model can potentially incorporate non-independent features such as acoustic phonetic features in a way that is robust to missing features due to mismatch between training and testing.

Full Paper

Bibliographic reference.  Kuo, Hong-Kwang Jeff / Gao, Yuqing (2004): "Maximum entropy direct model as a unified model for acoustic modeling in speech recognition", In INTERSPEECH-2004, 681-684.