Modeling Pronunciation Variation for Automatic Speech Recognition

Rolduc, The Netherlands
May 4-6, 1998

Context and Speed Dependent Phonemic Models for Continuous Speech Recognition

Feriel Mouria-Beji (1,2)

(1) ENSI/LIA, Artificial Intelligence Group, Tunis, Tunisia; (2) INRIA-Lorraine, Villers-les-Nancy, France

In this paper, we discuss the possibility of explicitly modeling, at a symbolic level, the contextual variation and the speaking rate effects which are the main sources of error in a continuous speech recognition system that uses phonems as the basic recognition units. It has been shown that modeling the phonetic context improves speech recognition accuracy. This paper describes a new approach called the automatically expending speed and context (AESC) approach, for including context-specific modelling in the ML-VINICS system where contextual deformations of the speech are described in a speech event-synchronized way rather than in the traditional time-synchronized way. Based on this approach, three models using respectively neural networks, stochastic trajectory modelling and a statistical method, were developed. When tested, with different speed, on a 1200 french sentences vocabulary, pronounced by eight male and two female speakers, they lead to an improvement in the recognition rate.

Full Paper

Bibliographic reference.  Mouria-Beji, Feriel (1998): "Context and speed dependent phonemic models for continuous speech recognition", In MPV-1998, 79-84.