9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

High-Performance Low-Latency Speech Recognition via Multi-Layered Feature Streaming and Fast Gaussian Computation

Liang Gu, Jian Xue, Xiaodong Cui, Yuqing Gao

IBM T.J. Watson Research Center, USA

Highly accurate speech recognition with very low latency is a big challenge but also an important requirement for modern real-time speech recognition applications such as speech-to-speech translation. We attack this problem by proposing a highly effective and efficient streaming mode decoding scheme. A novel multi-layered feature streaming method is introduced to minimize truncation errors during streaming by optimizing look-ahead parameters. A set of speed-up algorithms are further proposed to speed up both Gaussian computation and graph search. Experiments show dramatic reduction in decoding latency using the proposed decoding scheme, with high recognition accuracy similar to utterance based decoding.

Full Paper

Bibliographic reference.  Gu, Liang / Xue, Jian / Cui, Xiaodong / Gao, Yuqing (2008): "High-performance low-latency speech recognition via multi-layered feature streaming and fast Gaussian computation", In INTERSPEECH-2008, 2098-2101.