5th International Conference on Spoken Language Processing
This paper describes some of the implementation details of the ``Nozomi'' stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group, it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a disk-based LM the memory usage could be optimized to 4 MB in total.
Bibliographic reference. Schuster, Mike (1998): "Nozomi -- a fast, memory-efficient stack decoder for LVCSR", In ICSLP-1998, paper 0464.