Sixth European Conference on Speech Communication and Technology

We propose a statistical model for automatic speech recognition that relates variations in speaking rate to nonlinear warpings of time. The model describes a discrete random variable, s(t), that evolves as a function of the arc length traversed along a curve, parameterized by x(t). Since arc length does not depend on the rate at which a curve is traversed, this evolution gives rise to a family of Markov processes whose predictions, Pr[sx], are invariant to nonlinear warpings of time. We describe the use of such models, known as Markov processes on curves (MPCs), for automatic speech recognition, where x are acoustic feature trajectories and s are phonetic transcriptions. On two tasksrecognizing New Jersey town names and connected alphadigitswe find that MPCs yield lower word error rates than comparably trained hidden Markov models.
Full Paper (PDF) GnuZipped Postscript
Bibliographic reference. Saul, Lawrence / Rahim, Mazin (1999): "Modeling the rate of speech by Markov processes on curves", In EUROSPEECH'99, 415418.