Leveraging Second-Order Log-Linear Model for Improved Deep Learning Based ASR Performance

Ankit Raj, Shakti P Rath, Jithendra Vepa


Gaussian generative models have been shown to be equivalent to discriminative log-linear models under weak assumptions for acoustic modeling in speech recognition systems. In this paper, we note that the output layer of deep learning model consists of a first-order log-linear model, also known as logistic regression, which induces a set of homoscedastic distributions in the generative model space, resulting in linear decision boundaries. We leverage the above equivalence to make the deep learning models more expressive by replacing the first order log-linear model with a second-order model, which leads to heteroscedastic distributions, as a result, the linear decision boundaries are replaced with quadratic ones. We observe that the proposed architecture yields a significant improvement in speech recognition accuracy compared to the conventional model having a comparable number of parameters. Relative improvement of 8.37% and 3.92% in word error rate (WER) is obtained for shallow and deep feed-forward networks respectively. Moreover, with Long Short-Term Memory (LSTM) networks with projection matrix, we obtain significant relative improvement in WER over the standard architecture.


 DOI: 10.21437/Interspeech.2018-1156

Cite as: Raj, A., Rath, S.P., Vepa, J. (2018) Leveraging Second-Order Log-Linear Model for Improved Deep Learning Based ASR Performance. Proc. Interspeech 2018, 3738-3742, DOI: 10.21437/Interspeech.2018-1156.


@inproceedings{Raj2018,
  author={Ankit Raj and Shakti P Rath and Jithendra Vepa},
  title={Leveraging Second-Order Log-Linear Model for Improved Deep Learning Based ASR Performance},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3738--3742},
  doi={10.21437/Interspeech.2018-1156},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1156}
}