ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Improved MLP structures for data-driven feature extraction for ASR

Qifeng Zhu, Barry Y. Chen, Frantisek Grezl, Nelson Morgan

In this paper, we present our recent progress on multi-layer perceptron (MLP) based data-driven feature extraction using improved MLP structures. Four-layer MLPs are used in this study. Different signal processing methods are applied before the input layer of the MLP. We show that the first hidden layer of a four-layer MLP is able to detect some basic patterns from the time-frequency plane. KLT-based dimension reduction along time is applied as a modulation frequency filter. The new feature extraction was tested on a large vocabulary continuous speech recognition (LVCSR) task using the NIST 2001 evaluation set. We achieved 11.6% relative word error rate (WER) reduction compared to the traditional PLP-based baseline feature. This is also a significant improvement compared to our previously published results on the same task using MLP-based features with three-layer MLPs.


doi: 10.21437/Interspeech.2005-692

Cite as: Zhu, Q., Chen, B.Y., Grezl, F., Morgan, N. (2005) Improved MLP structures for data-driven feature extraction for ASR. Proc. Interspeech 2005, 2129-2132, doi: 10.21437/Interspeech.2005-692

@inproceedings{zhu05b_interspeech,
  author={Qifeng Zhu and Barry Y. Chen and Frantisek Grezl and Nelson Morgan},
  title={{Improved MLP structures for data-driven feature extraction for ASR}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={2129--2132},
  doi={10.21437/Interspeech.2005-692}
}