7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Modeling HMM State Distributions with Bayesian Networks

Konstantin Markov, Satoshi Nakamura

ATR Spoken Language Translation Research Labs, Japan

In current HMM based speech recognition systems, it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions, etc. On the other hand, Bayesian Networks (BN) allow for easy combination of different continuous as well as discrete features by exploring conditional dependencies between them. However, the lack of efficient algorithms has limited their application in continuous speech recognition. In this paper we propose new acoustic model, where HMM are used for modeling of temporal speech characteristics and state probability model is represented by BN. In our experimental system based on HMM/BN model, in addition to speech observation variable, state BN has two more (hidden) variables representing noise type and SNR value. Evaluation results on AURORA2 database showed 36.4% word error rate reduction for closed noise test without using any model adaptation or noise robust methods.

Full Paper

Bibliographic reference.  Markov, Konstantin / Nakamura, Satoshi (2002): "Modeling HMM state distributions with Bayesian networks", In ICSLP-2002, 1013-1016.