Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features

Xurong Xie, Xunying Liu, Tan Lee, Lan Wang


Speaker adaptation techniques play a key role in reducing the mismatch between automatic speech recognition (ASR) systems and target users. Deep neural network (DNN) acoustic model adaptation by learning speaker-dependent hidden unit contributions (LHUC) scaling vectors has been widely used. The standard LHUC method not only requires multiple decoding passes in test time but also a substantial amount of adaptation data for robust parameter estimation. In order to address the issues, an efficient method of predicting and compressing the LHUC scaling vectors directly from acoustic features using a time-delay DNN (TDNN) and an online averaging layer is proposed in this paper. The resulting LHUC vectors are then used as auxiliary features to adapt DNN acoustic models. Experiments conducted on a 300-hour Switchboard corpus showed that the DNN and TDNN systems using the proposed predicted LHUC features consistently outperformed the corresponding baseline systems by up to about 9% relative reductions of word error rate. Being combined with i-Vector based adaptation, the LHUC feature adapted TDNN systems demonstrated consistent improvement over comparable i-Vector adapted TDNN system.


 DOI: 10.21437/Interspeech.2019-2050

Cite as: Xie, X., Liu, X., Lee, T., Wang, L. (2019) Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features. Proc. Interspeech 2019, 759-763, DOI: 10.21437/Interspeech.2019-2050.


@inproceedings{Xie2019,
  author={Xurong Xie and Xunying Liu and Tan Lee and Lan Wang},
  title={{Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={759--763},
  doi={10.21437/Interspeech.2019-2050},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2050}
}