INTERSPEECH 2004 - ICSLP
Traditional statistical models for speech recognition have been dominated by generative models such as Hidden Markov Models (HMMs). We recently proposed a new framework for speech recognition using maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMMs, features can be non-independent, asynchronous, and overlapping. In this paper, we discuss how to make the computationally intensive training of such models feasible through parallelizing the IIS (Improved Iterative Scaling) algorithm. The direct model significantly outperforms traditional HMMs in word error rate when used as stand-alone acoustic models. Modest improvements over the best HMM system are seen when combined with HMM and language model scores. The maximum entropy model can potentially incorporate non-independent features such as acoustic phonetic features in a way that is robust to missing features due to mismatch between training and testing.
Bibliographic reference. Kuo, Hong-Kwang Jeff / Gao, Yuqing (2004): "Maximum entropy direct model as a unified model for acoustic modeling in speech recognition", In INTERSPEECH-2004, 681-684.