ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Robust i-vector based adaptation of DNN acoustic model for speech recognition

Sri Garimella, Arindam Mandal, Nikko Strom, Bjorn Hoffmeister, Spyros Matsoukas, Sree Hari Krishnan Parthasarathi

In the past, conventional i-vectors based on a Universal Background Model (UBM) have been successfully used as input features to adapt a Deep Neural Network (DNN) Acoustic Model (AM) for Automatic Speech Recognition (ASR). In contrast, this paper introduces Hidden Markov Model (HMM) based i-vectors that use HMM state alignment information from an ASR system for estimating i-vectors. Further, we propose passing these HMM based i-vectors though an explicit non-linear hidden layer of a DNN before combining them with standard acoustic features, such as log filter bank energies (LFBEs). To improve robustness to mismatched adaptation data, we also propose estimating i-vectors in a causal fashion for training the DNN, restricting the connectivity among hidden nodes in the DNN and applying a max-pool non-linearity at selected hidden nodes. In our experiments, these techniques yield about 5-7% relative word error rate (WER) improvement over the baseline speaker independent system in matched condition, and a substantial WER reduction for mismatched adaptation data.

doi: 10.21437/Interspeech.2015-605

Cite as: Garimella, S., Mandal, A., Strom, N., Hoffmeister, B., Matsoukas, S., Parthasarathi, S.H.K. (2015) Robust i-vector based adaptation of DNN acoustic model for speech recognition. Proc. Interspeech 2015, 2877-2881, doi: 10.21437/Interspeech.2015-605

  author={Sri Garimella and Arindam Mandal and Nikko Strom and Bjorn Hoffmeister and Spyros Matsoukas and Sree Hari Krishnan Parthasarathi},
  title={{Robust i-vector based adaptation of DNN acoustic model for speech recognition}},
  booktitle={Proc. Interspeech 2015},