EUROSPEECH '93

When using hidden Markov models for speech recognition, it is usually assumed that the probability that a particular acoustic vector is emitted at a given time only depends on the current state and the current acoustic vector observed. This model does not take account of the dynamic nature of the speech signal. In order to introduce time correlation between successive acoustic vectors, some authors have proposed to consider the time series of observations on a state to be generated by a nonlinear deterministic process corrupted by a Gaussian additive noise. This results in the introduction of the prediction error in the likelihood function. In this paper, we review the basic ideas underlying these models. Thereafter, we briefly introduce an extension of the linear case, i.e. we permit the autoregressive coefficients to be corrupted by noise. Indeed, when working at the speech samples level, this is a simple way to take the intra and inter speaker variability into account; that is, to allow variability in the transfer function. In fact, this is what we are doing when extracting LPC coefficients and clustering them with Gaussian distributions. The advantage here is that we directly introduce the variability at the sample level. This leads to processes that are known as AutoRegressive Conditional Heteroscedastic (ARCH) processes, with nonconstant variances conditional on the past.
Keywords: Hidden Markov models, autoregressive models.
Bibliographic reference. Aerens, Marco S. / Bourlard, Hervé (1993): "Linear and nonlinear prediction for speech recognition with hidden Markov models", In EUROSPEECH'93, 807810.