ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

I-vector estimation using informative priors for adaptation of deep neural networks

Penny Karanasou, Mark J. F. Gales, Philip C. Woodland

I-vectors are a well-known low-dimensional representation of speaker space and are becoming increasingly popular in adaptation of state-of-the-art deep neural network (DNN) acoustic models. One advantage of i-vectors is that they can be used with very little data, for example a single utterance. However, to improve robustness of the i-vector estimates with limited data, a prior is often used. Traditionally, a standard normal prior is applied to i-vectors, which is nevertheless not well suited to the increased variability of short utterances. This paper proposes a more informative prior, derived from the training data. As well as aiming to reduce the non-Gaussian behaviour of the i-vector space, it allows prior information at different levels, for example gender, to be used. Experiments on a US English Broadcast News (BN) transcription task for speaker and utterance i-vector adaptation show that more informative priors reduce the sensitivity to the quantity of data used to estimate the i-vector. The best configuration for this task was utterance-level test i-vectors enhanced with informative priors which gave a 13% relative reduction in word error rate over the baseline (no i-vectors) and a 5% over utterance-level test i-vectors with standard prior.


doi: 10.21437/Interspeech.2015-604

Cite as: Karanasou, P., Gales, M.J.F., Woodland, P.C. (2015) I-vector estimation using informative priors for adaptation of deep neural networks. Proc. Interspeech 2015, 2872-2876, doi: 10.21437/Interspeech.2015-604

@inproceedings{karanasou15_interspeech,
  author={Penny Karanasou and Mark J. F. Gales and Philip C. Woodland},
  title={{I-vector estimation using informative priors for adaptation of deep neural networks}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={2872--2876},
  doi={10.21437/Interspeech.2015-604}
}