12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Towards High Performance LVCSR in Speech-to-Speech Translation System on Smart Phones

Jian Xue, Xiaodong Cui, Gregg Daggett, Etienne Marcheret, Bowen Zhou

IBM T.J. Watson Research Center, USA

This paper presents the endeavors to improve the performance of large vocabulary continuous speech recognition (LVCSR) in speech-to-speech translation system on smart phones. A variety of techniques towards high LVCSR performance are investigated to achieve high accuracy and low latency given constrained resources. This includes one-pass streaming mode decoding for minimum latency, acoustic modeling with full-covariance based on bootstrap and model restructuring for improving recognition accuracy with limited training data; quantized discriminative feature space transforms and quantized Gaussian mixture model to reduce memory usage with negligible degradation on recognition accuracy. Some speed optimization methods are also discussed to increase the recognition speed. The proposed techniques evaluated on the DARPA Transtac datasets will be shown to give good overall performance under the constraints of both CPU and memory on smart phones.

Full Paper

Bibliographic reference.  Xue, Jian / Cui, Xiaodong / Daggett, Gregg / Marcheret, Etienne / Zhou, Bowen (2011): "Towards high performance LVCSR in speech-to-speech translation system on smart phones", In INTERSPEECH-2011, 2861-2864.