This paper describes the new Philips Research decoder that performs large vocabulary continuous speech recognition in a single pass for cross-word acoustic models and an m-gram language model (with m up to 4) as opposed to our previous technique of multiple passes. The decoder is based on a time-synchronous beam search and a prefix tree structure of the lexicon. Cross-word transitions are treated dynamically. A language-model look-ahead technique is applied on the bigram probabilities. On a variety of speech data, reduced error rates are obtained together with significant speed-ups confirming the advantage of an early use of all available knowledge sources. In particular, the search effort of a one-pass trigram decoding is only marginally increased compared to bigram and the integration of cross-word triphones improves the overall accuracy by typically 10% relative.
Cite as: Aubert, X.L. (1999) One pass cross word decoding for large vocabularies based on a lexical tree search organization. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 1559-1562, doi: 10.21437/Eurospeech.1999-133
@inproceedings{aubert99_eurospeech, author={Xavier L. Aubert}, title={{One pass cross word decoding for large vocabularies based on a lexical tree search organization}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={1559--1562}, doi={10.21437/Eurospeech.1999-133} }