Third European Conference on Speech Communication and Technology

Berlin, Germany
September 22-25, 1993


Constraining Model Duration Variance in HMM-Based Connected-Speech Recognition

Michael M. Hochberg (1), Harvey F. Silverman (2)

(1) Cambridge University Engineering Department, Cambridge, UK
(2) LEMS, Division of Engineering, Brown University, Providence, RI, USA

Typical statistical speech recognition systems model the duration of acoustic segments with discrete-time, first-order Markov chains. Recent work in the area of hidden Markov models (HMMs) has extended the modeling approach to discrete-time, first-order semi-Markov processes. The Markov assumption that the states of a model are independent can result in word-duration statistics which are quite different from those observed during recognition or obtained through labeling. This paper presents an approach for extending the EMM frame-work such that a priori conditions on the model duration statistics are satisfied. The constrained variance EMM (CV-EMM) is presented as a means to capture both the robust state-duration modeling capability of the traditional ESMM while imposing constraints on the word-duration variance. The paper presents the CV-EMM framework and describes the parameter estimation processing. Results on a speaker-independent, connected-speech task are reported and compared with traditional EMM approaches.

Full Paper

Bibliographic reference.  Hochberg, Michael M. / Silverman, Harvey F. (1993): "Constraining model duration variance in HMM-based connected-speech recognition", In EUROSPEECH'93, 323-326.