Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Using MLP Features in SRI's Conversational Speech Recognition System

Qifeng Zhu (1), Andreas Stolcke (2), Barry Y. Chen (1), Nelson Morgan (1)

(1) International Computer Science Institute, USA; (2) SRI International, USA

We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLP). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features. This paper focuses on the challenges arising when incorporating these nonstandard features into a full-scale speech-to-text (STT) system, as used by SRI in the Fall 2004 DARPA STT evaluations. First, we developed a series of time-saving techniques for training feature MLPs on 1800 hours of speech. Second, we investigated which components of a multipass, multi-front-end recognition system are most profitably augmented with MLP features for best overall performance. The final system obtained achieved a 2% absolute (10% relative) WER reduction over a comparable baseline system that did not include Tandem/HATs MLP features.

Full Paper

Bibliographic reference.  Zhu, Qifeng / Stolcke, Andreas / Chen, Barry Y. / Morgan, Nelson (2005): "Using MLP features in SRI's conversational speech recognition system", In INTERSPEECH-2005, 2141-2144.