Typical statistical speech recognition systems model the duration of acoustic segments with discrete-time, first-order Markov chains. Recent work in the area of hidden Markov models (HMMs) has extended the modeling approach to discrete-time, first-order semi-Markov processes. The Markov assumption that the states of a model are independent can result in word-duration statistics which are quite different from those observed during recognition or obtained through labeling. This paper presents an approach for extending the EMM frame-work such that a priori conditions on the model duration statistics are satisfied. The constrained variance EMM (CV-EMM) is presented as a means to capture both the robust state-duration modeling capability of the traditional ESMM while imposing constraints on the word-duration variance. The paper presents the CV-EMM framework and describes the parameter estimation processing. Results on a speaker-independent, connected-speech task are reported and compared with traditional EMM approaches.
Bibliographic reference. Hochberg, Michael M. / Silverman, Harvey F. (1993): "Constraining model duration variance in HMM-based connected-speech recognition", In EUROSPEECH'93, 323-326.