Deep neural networks (DNNs) have been successfully applied to a variety of automatic speech recognition (ASR) tasks, both in discriminative feature extraction and hybrid acoustic modeling scenarios. The development of improved loss functions and regularization approaches have resulted in consistent reductions in ASR word error rates (WERs). This paper presents a manifold learning based regularization framework for DNN training. The associated techniques attempt to preserve the underlying low dimensional manifold based relationships amongst speech feature vectors as part of the optimization procedure for estimating network parameters. This is achieved by imposing manifold based locality preserving constraints on the outputs of the network. The techniques are presented in the context of a bottleneck DNN architecture for feature extraction in a tandem configuration. The ASR WER obtained using these networks is evaluated on a speech-in-noise task and compared to that obtained using DNN-bottleneck networks trained without manifold constraints.
Bibliographic reference. Tomar, Vikrant Singh / Rose, Richard C. (2014): "Manifold regularized deep neural networks", In INTERSPEECH-2014, 348-352.