Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting

Zhong Meng, Biing-Hwang Juang


It has been shown in [1, 2] that improved performance can be achieved by formulating the keyword spotting as a non-uniform error automatic speech recognition problem. In this work, we discriminatively train a deep bidirectional long short-term memory (BLSTM) — hidden Markov model (HMM) based acoustic model with non-uniform boosted minimum classification error (BMCE) criterion which imposes more significant error cost on the keywords than those on the non-keywords. By introducing the BLSTM, the context information in both the past and the future are stored and updated to predict the desired output and the long-term dependencies within the speech signal are well captured. With non-uniform BMCE objective, the BLSTM is trained so that the recognition errors related to the keywords are remarkably reduced. The BLSTM is optimized using back-propagation through time and stochastic gradient descent. The keyword spotting system is implemented within weighted finite state transducer framework. The proposed method achieves 5.49% and 7.37% absolute figure-of-merit improvements respectively over the BLSTM and the feedforward deep neural network baseline systems trained with cross-entropy criterion for the keyword spotting task on Switchboard-1 Release 2 dataset.


 DOI: 10.21437/Interspeech.2017-583

Cite as: Meng, Z., Juang, B. (2017) Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting. Proc. Interspeech 2017, 3547-3551, DOI: 10.21437/Interspeech.2017-583.


@inproceedings{Meng2017,
  author={Zhong Meng and Biing-Hwang Juang},
  title={Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3547--3551},
  doi={10.21437/Interspeech.2017-583},
  url={http://dx.doi.org/10.21437/Interspeech.2017-583}
}