Sixth European Conference on Speech Communication and Technology
This paper describes the new Philips Research decoder that performs large vocabulary continuous speech recognition in a single pass for cross-word acoustic models and an m-gram language model (with m up to 4) as opposed to our previous technique of multiple passes. The decoder is based on a time-synchronous beam search and a prefix tree structure of the lexicon. Cross-word transitions are treated dynamically. A language-model look-ahead technique is applied on the bigram probabilities. On a variety of speech data, reduced error rates are obtained together with significant speed-ups confirming the advantage of an early use of all available knowledge sources. In particular, the search effort of a one-pass trigram decoding is only marginally increased compared to bigram and the integration of cross-word triphones improves the overall accuracy by typically 10% relative.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Aubert, Xavier L. (1999): "One pass cross word decoding for large vocabularies based on a lexical tree search organization", In EUROSPEECH'99, 1559-1562.