September 22-25, 1997
The performance of the Philips system for large vocabulary continuous speech recognition has been improved significantly by crossword N-phone modelling, enhanced clustering of HMM-states during training, consistent handling of untrained HMM-states during decoding and a new effcient crossword N-phone M-gram decoding strategy. We report word error rate reductions of up to 18% on various ARPA test sets as compared to our best within-word triphone system, based on Laplacian densities, Viterbi decoding and _lterbank-LDA features. The following two issues are addressed: a) Transformation of a tree-organized bigram beam- search decoder into an effcient tree- organized decoder capable of handling long-span acoustic contexts as well as long-span language model contexts. b) State-clustering and generalizing of unseen contexts for the case of Laplacian emission probability density functions.
Bibliographic reference. Beyerlein, Peter / Ullrich, Meinhard / Wilcox, Patricia (1997): "Modelling and decoding of crossword context dependent phones in the Philips large vocabulary continuous speech recognition system", In EUROSPEECH-1997, 1163-1166.