ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting

Zhong Meng, Biing-Hwang Juang

It has been shown in [1, 2] that improved performance can be achieved by formulating the keyword spotting as a non-uniform error automatic speech recognition problem. In this work, we discriminatively train a deep bidirectional long short-term memory (BLSTM) — hidden Markov model (HMM) based acoustic model with non-uniform boosted minimum classification error (BMCE) criterion which imposes more significant error cost on the keywords than those on the non-keywords. By introducing the BLSTM, the context information in both the past and the future are stored and updated to predict the desired output and the long-term dependencies within the speech signal are well captured. With non-uniform BMCE objective, the BLSTM is trained so that the recognition errors related to the keywords are remarkably reduced. The BLSTM is optimized using back-propagation through time and stochastic gradient descent. The keyword spotting system is implemented within weighted finite state transducer framework. The proposed method achieves 5.49% and 7.37% absolute figure-of-merit improvements respectively over the BLSTM and the feedforward deep neural network baseline systems trained with cross-entropy criterion for the keyword spotting task on Switchboard-1 Release 2 dataset.


doi: 10.21437/Interspeech.2017-583

Cite as: Meng, Z., Juang, B.-H. (2017) Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting. Proc. Interspeech 2017, 3547-3551, doi: 10.21437/Interspeech.2017-583

@inproceedings{meng17b_interspeech,
  author={Zhong Meng and Biing-Hwang Juang},
  title={{Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3547--3551},
  doi={10.21437/Interspeech.2017-583}
}