ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Using MLP features in SRI's conversational speech recognition system

Qifeng Zhu, Andreas Stolcke, Barry Y. Chen, Nelson Morgan

We describe the development of a speech recognition system for conversational telephone speech (CTS) that incorporates acoustic features estimated by multilayer perceptrons (MLP). The acoustic features are based on frame-level phone posterior probabilities, obtained by merging two different MLP estimators, one based on PLP-Tandem features, the other based on hidden activation TRAPs (HATs) features. This paper focuses on the challenges arising when incorporating these nonstandard features into a full-scale speech-to-text (STT) system, as used by SRI in the Fall 2004 DARPA STT evaluations. First, we developed a series of time-saving techniques for training feature MLPs on 1800 hours of speech. Second, we investigated which components of a multipass, multi-front-end recognition system are most profitably augmented with MLP features for best overall performance. The final system obtained achieved a 2% absolute (10% relative) WER reduction over a comparable baseline system that did not include Tandem/HATs MLP features.

doi: 10.21437/Interspeech.2005-695

Cite as: Zhu, Q., Stolcke, A., Chen, B.Y., Morgan, N. (2005) Using MLP features in SRI's conversational speech recognition system. Proc. Interspeech 2005, 2141-2144, doi: 10.21437/Interspeech.2005-695

  author={Qifeng Zhu and Andreas Stolcke and Barry Y. Chen and Nelson Morgan},
  title={{Using MLP features in SRI's conversational speech recognition system}},
  booktitle={Proc. Interspeech 2005},