International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

The Inhomogeneous Hidden Markov Models and its Training and Recognition Algorithms of Speech Recognition

Zuoying Wang, Xi Xiao

E.E. Department of Tsinghua University, Beijing, China

While a great success has been achieved in the application of hidden Markov models (HMMs) in speech recognition, it is noticed that the present prevailing homogeneous hidden Markov models can not properly describe a lot of important information concerned with the speech state duration. In this report, according to the characteristics of speech, a general modeling of inhomogeneous HMM (IHMM) is proposed by a formalized defining of HMM, and it is proven that the state duration distribution model is a equivalent representation of heterogeneous HMM. The iterative training algorithm for training the parameters of IHMM and the fast decoding algorithm based on the most likely state sequence are also given in the report. In this report, the discussions will also be given on the advantages of duration distribution representation for the IHMM and the proposed training and recognition algorithm. It can be seen that proposed IHMM and its duration representation keeps the capability of modeling the temporary and spatial correlation of the speech features and is more suitable in the applications to improve the capabilities of modeling the variation of speaking rate and stammering in speech which are related with the state duration distributions. Finally the experiments for comparing the performances of the inhomogeneous HMM with the dominant classical HMM are presented in the report.


Bibliographic reference.  Wang, Zuoying / Xiao, Xi (2002): "The inhomogeneous hidden Markov models and its training and recognition algorithms of speech recognition", In ISCSLP 2002, paper INV1.