Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition

Yusuke Shinohara


A method of learning deep neural networks (DNNs) for noise robust speech recognition is proposed. It is widely known that representations (activations) of well-trained DNNs are highly invariant to noise, especially in higher layers, and such invariance leads to the noise robustness of DNNs. However, little is known about how to enhance such invariance of representations, which is a key for improving robustness. In this paper, we propose adversarial multi-task learning of DNNs for explicitly enhancing the invariance of representations. Specifically, a primary task of senone classification and a secondary task of domain (noise condition) classification are jointly solved. What is different from the standard multi-task learning is that the representation is learned adversarially to the secondary task, so that representation with low domain-classification accuracy is induced. As a result, senone-discriminative and domain-invariant representation is obtained, which leads to an improved robustness of DNNs. Experimental results on a noise-corrupted Wall Street Journal data set show the effectiveness of the proposed method.


DOI: 10.21437/Interspeech.2016-879

Cite as

Shinohara, Y. (2016) Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition. Proc. Interspeech 2016, 2369-2372.

Bibtex
@inproceedings{Shinohara2016,
author={Yusuke Shinohara},
title={Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-879},
url={http://dx.doi.org/10.21437/Interspeech.2016-879},
pages={2369--2372}
}