5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Nozomi -- a Fast, Memory-Efficient Stack Decoder For LVCSR

Mike Schuster

ATR, Interpreting Telecommunications Laboratories, Japan

This paper describes some of the implementation details of the ``Nozomi'' stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group, it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a disk-based LM the memory usage could be optimized to 4 MB in total.

