Third International Conference on Spoken Language Processing (ICSLP 94)
A major research activity at LIMSI is multilingual, speaker-independent, large vocabulary speech dictation. In this paper we report on efforts in large vocabulary, speaker-independent continuous speech recognition of French using the BREF corpus. Recognition experiments were carried out with vocabularies containing up to 20k words. The recognizer makes use of continuous density HMM with Gaussian mixture for acoustic modeling and n-gram statistics estimated on 38 million words of newspaper text from Le Monde for language modeling. The recognizer uses a time-synchronous graph-search strategy. When a bigram language model is used, recognition is carried out in a single forward pass. A second forward pass, which makes use of a word graph generated with the bigram language model, incorporates a trigram language model. Acoustic modeling uses cepstrum-based features, context-dependent phone models and phone duration models. An average phone accuracy of 86% was achieved. A word accuracy of 84% has been obtained for an unrestricted vocabulary test and 95% for a 5k vocabulary test.
Bibliographic reference. Gauvain, Jean-Luc / Lamel, Lori F. / Adda, Gilles / Adda-Decker, Martine (1994): "Continuous speech dictation in French", In ICSLP-1994, 2127-2130.