ASR2000 - Automatic Speech Recognition: Challenges for the new Millenium

September 18-20, 2000
Paris, France

Weighted Finite-State Transducers in Speech Recognition

Mehryar Mohri (1), Fernando Pereira (2), and Michael Riley (1)

(1) AT&T Labs - Research, Florham Park, NJ, USA
(2) WhizBang! Labs, Pittsburgh, PA, USA

We survey the weighted finite-state transducer (WFST) approach to speech recognition developed at AT&T over the last several years. We show that WFSTs provide a common and natural representation for HMM models, context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general finite-state operations combine these representations flexibly and efficiently. Weighted determinization and minimization algorithms optimize their time and space requirements, and a weight pushing algorithm distributes the weights along the paths of a weighted transducer optimally for speech recognition. As an example, we describe a North American Business News (NAB) recognition system built using these techniques that combines the HMMs, full cross-word triphones, a lexicon of forty thousand words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real-time on a very simple decoder. In another example, we show that the same techniques can be used to optimize lattices for second-pass recognition. In a third example, we show how finite-state operations can be used to assemble lattices from different recognizers to improve recognition performance.


Full Paper (PDF)   Full Paper (Zipped Postscript)
Full Paper - older version (PDF)   Full Paper - older version (Zipped Postscript)

Bibliographic reference.  Mohri, Mehryar / Pereira, Fernando / Riley, Michael (2000): "Weighted finite-state transducers in speech recognition", In ASR-2000, 97-106.