Speaker Normalization Through Feature Shifting of Linearly Transformed i-Vector

Jahyun Goo, Younggwan Kim, Hyungjun Lim, Hoirin Kim


In this paper, we propose a simple speaker normalization for deep neural network (DNN) using i-vectors, the state-of-the-art technique for speaker recognition, for automatic speech recognition. There have been already many techniques using i-vectors for speaker adaptation or speaker variability reduction of DNN acoustic models. However, in order to add the speaker information into the acoustic feature, most of those techniques have to train a large number of parameters while dimensionality of the i-vector is quite small. We tried to apply a component-wise shift to the acoustic features by linearly transformed i-vector, and then achieved the better performance than typical approaches. On top of that, we propose to modify this structure to adapt each frame of the features, reducing the number of parameters. Experiments were conducted on the TED-LIUM release-1 corpus, and the proposed method showed some performance gains.


DOI: 10.21437/Interspeech.2016-819

Cite as

Goo, J., Kim, Y., Lim, H., Kim, H. (2016) Speaker Normalization Through Feature Shifting of Linearly Transformed i-Vector. Proc. Interspeech 2016, 3489-3493.

Bibtex
@inproceedings{Goo+2016,
author={Jahyun Goo and Younggwan Kim and Hyungjun Lim and Hoirin Kim},
title={Speaker Normalization Through Feature Shifting of Linearly Transformed i-Vector},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-819},
url={http://dx.doi.org/10.21437/Interspeech.2016-819},
pages={3489--3493}
}