Most of the current state-of-the-art speech recognition systems are based on HMMs which usually use mixture of Gaussian functions as state probability distribution model. It is a common practice to use EM algorithm for Gaussian mixture parameter learning. In this case, the learning is done in a ”blind”, data-driven way without taking into account how the speech signal has been produced and which factors it depends on. In this paper, we describe the hybrid HMM/BN acoustic modeling framework, where, in contract to the conventional mixture of Gaussians,HMMstate probability distribution is modeled by a Bayesian Network, hence the name is HMM/BN. Temporal speech characteristics are still governed by the HMM state transitions, but the state output likelihood is inferred from the BN. This allows for very flexible and consistent models of the state probability distributions which can easily integrate different speech parameterizations. BN can represent various speech features and environment conditions and their underlying dependencies. We show that the conventional HMM is a special case of HMM/BN model which we regard as a generalization of the HMM. The HMM/BN parameter learning is based on the Viterbi training paradigm and consists of two alternating steps - BN training and HMM transition probabilities update. For recognition, in some cases, BN inference is computationally equivalent to mixture of Gaussians which allows HMM/BN model to be used in existing HMM decoders. We present several examples of HMM/BN model application in speech recognition systems. Evaluations under various conditions and for different tasks showed that the HMM/BN model gives consistently better performance that the standard mixture of Gaussians HMM.
Cite as: Markov, K., Nakamura, S. (2004) Advanced acoustic modeling with the hybrid HMM/BN framework. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 248-255
@inproceedings{markov04_specom, author={Konstantin Markov and Satoshi Nakamura}, title={{Advanced acoustic modeling with the hybrid HMM/BN framework}}, year=2004, booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)}, pages={248--255} }