This paper presents the endeavors to improve the performance of large vocabulary continuous speech recognition (LVCSR) in speech-to-speech translation system on smart phones. A variety of techniques towards high LVCSR performance are investigated to achieve high accuracy and low latency given constrained resources. This includes one-pass streaming mode decoding for minimum latency, acoustic modeling with full-covariance based on bootstrap and model restructuring for improving recognition accuracy with limited training data; quantized discriminative feature space transforms and quantized Gaussian mixture model to reduce memory usage with negligible degradation on recognition accuracy. Some speed optimization methods are also discussed to increase the recognition speed. The proposed techniques evaluated on the DARPA Transtac datasets will be shown to give good overall performance under the constraints of both CPU and memory on smart phones.
Bibliographic reference. Xue, Jian / Cui, Xiaodong / Daggett, Gregg / Marcheret, Etienne / Zhou, Bowen (2011): "Towards high performance LVCSR in speech-to-speech translation system on smart phones", In INTERSPEECH-2011, 2861-2864.