7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Distributed Speech Recognition Using Noise-Robust MFCC and Traps-Estimated Manner Features

Pratibha Jain (1), Hynek Hermansky (1), Brian Kingsbury (2)

(1) Oregon Health & Science University, USA; (2) IBM T.J. Watson Research Center, USA

In this paper, we investigate the use of TemPoRal PatternS (TRAPS) classifiers for estimating manner of articulation features on the smallvocabulary Aurora-2002 database. By combining a stream of TRAPSestimated manner features with a stream of noise-robust MFCC features (earlier proposed in the Aurora-2002 evaluation by OGI, ICSI and Qualcomm), we obtain an average absolute improvement of 0.4% to 1.0% in word recognition accuracy over noise-robust MFCC baseline features on Aurora tasks. This yields an average relative improvement of 54% over the reference end-pointed MFCC baseline. Estimation of the manner features can be performed on the server without increasing the terminal-side computational complexity in a distributed speech recognition (DSR) system.

Full Paper

Bibliographic reference.  Jain, Pratibha / Hermansky, Hynek / Kingsbury, Brian (2002): "Distributed speech recognition using noise-robust MFCC and traps-estimated manner features", In ICSLP-2002, 473-476.