14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Improved Feature Processing for Deep Neural Networks

Shakti P. Rath (1), Daniel Povey (2), Karel Veselý (1), Jan Černocký (1)

(1) Brno University of Technology, Czech Republic
(2) Johns Hopkins University, USA

In this paper, we investigate alternative ways of processing MFCCbased features to use as the input to Deep Neural Networks (DNNs). Our baseline is a conventional feature pipeline that involves splicing the 13-dimensional front-end MFCCs across 9 frames, followed by applying LDA to reduce the dimension to 40 and then further decorrelation using MLLT. Confirming the results of other groups, we show that speaker adaptation applied on the top of these features using feature-space MLLR is helpful. The fact that the number of parameters of a DNN is not strongly sensitive to the input feature dimension (unlike GMM-based systems) motivated us to investigate ways to increase the dimension of the features. In this paper, we investigate several approaches to derive higher-dimensional features and verify their performance with DNN. Our best result is obtained from splicing our baseline 40-dimensional speaker adapted features again across 9 frames, followed by reducing the dimension to 200 or 300 using another LDA. Our final result is about 3% absolute better than our best GMM system, which is a discriminatively trained model.

Full Paper

Bibliographic reference.  Rath, Shakti P. / Povey, Daniel / Veselý, Karel / Černocký, Jan (2013): "Improved feature processing for deep neural networks", In INTERSPEECH-2013, 109-113.