Stimulated Deep Neural Network for Speech Recognition

Chunyang Wu, Penny Karanasou, Mark J.F. Gales, Khe Chai Sim


Deep neural networks (DNNs) and deep learning approaches yield state-of-the-art performance in a range of tasks, including speech recognition. However, the parameters of the network are hard to analyze, making network regularization and robust adaptation challenging. Stimulated training has recently been proposed to address this problem by encouraging the node activation outputs in regions of the network to be related. This kind of information aids visualization of the network, but also has the potential to improve regularization and adaptation. This paper investigates stimulated training of DNNs for both of these options. These schemes take advantage of the smoothness constraints that stimulated training offers. The approaches are evaluated on two large vocabulary speech recognition tasks: a U.S. English broadcast news (BN) task and a Javanese conversational telephone speech task from the IARPA Babel program. Stimulated DNN training acquires consistent performance gains on both tasks over unstimulated baselines. On the BN task, the proposed smoothing approach is also applied to rapid adaptation, again outperforming the standard adaptation scheme.


DOI: 10.21437/Interspeech.2016-580

Cite as

Wu, C., Karanasou, P., Gales, M.J., Sim, K.C. (2016) Stimulated Deep Neural Network for Speech Recognition. Proc. Interspeech 2016, 400-404.

Bibtex
@inproceedings{Wu+2016,
author={Chunyang Wu and Penny Karanasou and Mark J.F. Gales and Khe Chai Sim},
title={Stimulated Deep Neural Network for Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-580},
url={http://dx.doi.org/10.21437/Interspeech.2016-580},
pages={400--404}
}