Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models

Lahiru Samarakoon, Brian Mak, Khe Chai Sim


Factorized Hidden Layer (FHL) adaptation has been proposed for speaker adaptation of deep neural network (DNN) based acoustic models. In FHL adaptation, a speaker-dependent (SD) transformation matrix and an SD bias are included in addition to the standard affine transformation. The SD transformation is a linear combination of rank-1 matrices whereas the SD bias is a linear combination of vectors. Recently, the Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) have shown to outperform DNN acoustic models in many Automatic Speech Recognition (ASR) tasks. In this work, we investigate the effectiveness of SD transformations for LSTM-RNN acoustic models. Experimental results show that when combined with scaling of LSTM cell states’ outputs, SD transformations achieve 2.3% and 2.1% absolute improvements over the baseline LSTM systems for the AMI IHM and AMI SDM tasks respectively.


 DOI: 10.21437/Interspeech.2017-1136

Cite as: Samarakoon, L., Mak, B., Sim, K.C. (2017) Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models. Proc. Interspeech 2017, 744-748, DOI: 10.21437/Interspeech.2017-1136.


@inproceedings{Samarakoon2017,
  author={Lahiru Samarakoon and Brian Mak and Khe Chai Sim},
  title={Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={744--748},
  doi={10.21437/Interspeech.2017-1136},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1136}
}