![]() |
Modeling Pronunciation Variation for Automatic Speech RecognitionRolduc, The Netherlands |
![]() ![]() |
In this paper, we discuss the possibility of explicitly modeling, at a symbolic level, the contextual variation and the speaking rate effects which are the main sources of error in a continuous speech recognition system that uses phonems as the basic recognition units. It has been shown that modeling the phonetic context improves speech recognition accuracy. This paper describes a new approach called the automatically expending speed and context (AESC) approach, for including context-specific modelling in the ML-VINICS system where contextual deformations of the speech are described in a speech event-synchronized way rather than in the traditional time-synchronized way. Based on this approach, three models using respectively neural networks, stochastic trajectory modelling and a statistical method, were developed. When tested, with different speed, on a 1200 french sentences vocabulary, pronounced by eight male and two female speakers, they lead to an improvement in the recognition rate.
Bibliographic reference. Mouria-Beji, Feriel (1998): "Context and speed dependent phonemic models for continuous speech recognition", In MPV-1998, 79-84.