When using hidden Markov models for speech recognition, it is usually assumed that the probability that a particular acoustic vector is emitted at a given time only depends on the current state and the current acoustic vector observed. This model does not take account of the time correlation between successive acoustic vectors. We recently introduced two models that try to take account of the continuous-time dynamic nature of the speech signal. The first model assumes that, in a given state, the acoustic vectors are generated by a linear stochastic differential equation; the second one assumes that the acoustic vectors are generated by a particular continuous-time Markov process. This work is motivated by the fact that the time evolution of the acoustic vector is inherently dynamic and continuous, so that the modelling could be performed in the continuous-time domain instead of the discrete-time domain. By the way, the links between the discrete-time process obtained after sampling, and the original continuous-time system are not so trivial. In particular, the relationship between the coefficients of a continuous-time linear process and the coefficients of the discrete-time linear process obtained after sampling is nonlinear. We assign a probability density to the continuous-time trajectory of the acoustic vector inside the state, reflecting the probability that this particular path has been generated by the stochastic process associated with this state. This allows us to compute the likelihood of the uttered word. Reestimation formulae for the parameters of the process, based on the maximization of the likelihood, can be derived for the Viterbi algorithm. As usual, the segmentation can be obtained by sampling the continuous process, and by applying dynamic programming to find the best path over all the possible sequences of states.
Keywords: Hidden Markov models, speech recognition.
Bibliographic reference. Saerens, Marco (1993): "Hidden Markov models assuming a continuous-time dynamic emission of acoustic vectors", In EUROSPEECH'93, 587-590.