International Symposium on Chinese Spoken Language Processing
August 23-24, 2002
The Inhomogeneous Hidden Markov Models and its Training and Recognition Algorithms of Speech Recognition
Zuoying Wang, Xi Xiao
E.E. Department of Tsinghua University, Beijing, China
While a great success has been achieved in the application of hidden Markov models (HMMs)
in speech recognition, it is noticed that the present prevailing homogeneous hidden Markov
models can not properly describe a lot of important information concerned with the speech state
duration. In this report, according to the characteristics of speech, a general modeling of
inhomogeneous HMM (IHMM) is proposed by a formalized defining of HMM, and it is proven
that the state duration distribution model is a equivalent representation of heterogeneous HMM.
The iterative training algorithm for training the parameters of IHMM and the fast decoding
algorithm based on the most likely state sequence are also given in the report.
In this report, the discussions will also be given on the advantages of duration distribution
representation for the IHMM and the proposed training and recognition algorithm. It can be seen
that proposed IHMM and its duration representation keeps the capability of modeling the
temporary and spatial correlation of the speech features and is more suitable in the applications to
improve the capabilities of modeling the variation of speaking rate and stammering in speech
which are related with the state duration distributions. Finally the experiments for comparing the
performances of the inhomogeneous HMM with the dominant classical HMM are presented in the
Wang, Zuoying / Xiao, Xi (2002):
"The inhomogeneous hidden Markov models and its training and recognition algorithms of speech recognition",
In ISCSLP 2002, paper INV1.