European Conference on Speech Technology
Edinburgh, Scotland, UK
Semi-hidden Markov models (SHMMs) have been suggested and applied to isolated speaker-dependent E-set recognition. The SHMM differs from the conventional hidden Markov model (HMM) in that its states can be classified into types. A function which detects signals corresponding to state types is thus included in the SHMMs and utilized to supervise the estimation of their parameters. This general structure is implemented in the recognition experiment as models with their states classified into stationary and transient types. The average recognition error rate is about 18.9% which compares favourably with the average of about 36.4% reported when using a dynamic time warping (DTW) recognition system by Lienard and Soong (ref 3) on an equivalent vocabulary. Tests using corresponding HMMs show similar results to that of the DTW system.
Bibliographic reference. Zhang, X. / Mason, John S. D. (1987): "Speech recognition using semi-hidden Markov models of multiple features", In ECST-1987, 2445-2448.