Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

The AT&T Large Vocabulary Conversational Speech Recognition System

Andrej Ljolje, Michael D. Riley, Donald M. Hindle

AT&T Labs - Research, Florham Park, NJ, USA

We describe the AT&T recognition system used in the DARPA Large Vocabulary Conversational Speech Recognition (LVCSR-98) evaluation. It is based on multi-pass rescoring of weighted Finite State Machines (FSMs) using progressively more accu-rate acoustic models. Acoustic models used in the system are all gender independent. They are based on three state context-dependent hidden Markov models using Gaussian mixtures. The recognition paradigm uses the baseline system to generate a set of word lattices. Subsequent passes use Vocal Tract Normaliza-tion (VTN), Maximum Likelihood Linear Regression (MLLR) adaptation and ROVER to further refine the recognition output. All the acoustic models (except for one of the additional models used in the ROVER experiments) employed models of alterna-tive pronunciations to improve recognition performance. The overall recognition word error rate on the LVCSR-98 evaluation set was 44.1 %.

