16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Robust i-Vector Based Adaptation of DNN Acoustic Model for Speech Recognition

Sri Garimella (1), Arindam Mandal (2), Nikko Strom (2), Bjorn Hoffmeister (2), Spyros Matsoukas (2), Sree Hari Krishnan Parthasarathi (2)

(1), India
(2), USA

In the past, conventional i-vectors based on a Universal Background Model (UBM) have been successfully used as input features to adapt a Deep Neural Network (DNN) Acoustic Model (AM) for Automatic Speech Recognition (ASR). In contrast, this paper introduces Hidden Markov Model (HMM) based i-vectors that use HMM state alignment information from an ASR system for estimating i-vectors. Further, we propose passing these HMM based i-vectors though an explicit non-linear hidden layer of a DNN before combining them with standard acoustic features, such as log filter bank energies (LFBEs). To improve robustness to mismatched adaptation data, we also propose estimating i-vectors in a causal fashion for training the DNN, restricting the connectivity among hidden nodes in the DNN and applying a max-pool non-linearity at selected hidden nodes. In our experiments, these techniques yield about 5-7% relative word error rate (WER) improvement over the baseline speaker independent system in matched condition, and a substantial WER reduction for mismatched adaptation data.

Full Paper

Bibliographic reference.  Garimella, Sri / Mandal, Arindam / Strom, Nikko / Hoffmeister, Bjorn / Matsoukas, Spyros / Parthasarathi, Sree Hari Krishnan (2015): "Robust i-vector based adaptation of DNN acoustic model for speech recognition", In INTERSPEECH-2015, 2877-2881.