Factorized Linear Input Network for Acoustic Model Adaptation in Noisy Conditions

Dung T. Tran, Marc Delroix, Atsunori Ogawa, Tomohiro Nakatani


Deep neural network (DNN) based acoustic models have obtained remarkable performance for many speech recognition tasks. However, recognition performance still remains too low in noisy conditions. To address this issue, a speech enhancement front-end is often used before recognition. Such a front-end can reduce noise but there may remain a mismatch due to the difference in training and testing conditions and the imperfectness of the enhancement front-end. Acoustic model adaptation can be used to mitigate such a mismatch. In this paper, we investigate an extension of the linear input network (LIN) adaptation framework, where the feature transformation is realized as a weighted combination of affine transforms of the enhanced input features. The weights are derived from a vector characterizing the noise conditions. We tested our approach on the real data set of CHiME3 challenge task, confirming the effectiveness of our approach.


DOI: 10.21437/Interspeech.2016-732

Cite as

Tran, D.T., Delroix, M., Ogawa, A., Nakatani, T. (2016) Factorized Linear Input Network for Acoustic Model Adaptation in Noisy Conditions. Proc. Interspeech 2016, 3813-3817.

Bibtex
@inproceedings{Tran+2016,
author={Dung T. Tran and Marc Delroix and Atsunori Ogawa and Tomohiro Nakatani},
title={Factorized Linear Input Network for Acoustic Model Adaptation in Noisy Conditions},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-732},
url={http://dx.doi.org/10.21437/Interspeech.2016-732},
pages={3813--3817}
}