Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks

Wonkyum Lee, Kyu J. Han, Ian Lane


In this paper, we present a new i-vector based speaker adaptation method for automatic speech recognition with deep neural networks, focusing on in-vehicle scenarios. Our proposed method is, rather than augmenting i-vectors to acoustic feature vectors to form concatenated input vectors for adapting neural network acoustic model parameters, is to perform feature-space transformation with smaller transformation neural networks dedicated to acoustic feature vectors and i-vectors, respectively, followed by a layer of linear combination of the network outputs. This feature-space transformation is learned via semi-supervised learning without any parameter change in the original deep neural network acoustic model. Experimental results show that our proposed method achieves 18.3% relative improvement in terms of word error rate compared to the speaker independent performance, and verify that it has a potential to replace well-known feature-space Maximum Likelihood Linear Regression (fMLLR) in in-vehicle speech recognition with deep neural networks.


DOI: 10.21437/Interspeech.2016-1625

Cite as

Lee, W., Han, K.J., Lane, I. (2016) Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks. Proc. Interspeech 2016, 3843-3847.

Bibtex
@inproceedings{Lee+2016,
author={Wonkyum Lee and Kyu J. Han and Ian Lane},
title={Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1625},
url={http://dx.doi.org/10.21437/Interspeech.2016-1625},
pages={3843--3847}
}