Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting

Zhong Meng, Biing-Hwang Juang


Keyword spotting can be formulated as a non-uniform error automatic speech recognition (ASR) problem. It has been demonstrated [1] that this new formulation with the non-uniform MCE training technique can lead to improved system performance in keyword spotting applications. In this paper, we demonstrate that deep neural networks (DNNs) can be successfully trained on the non-uniform minimum classification error (MCE) criterion which weighs the errors on keywords much more significantly than those on non-keywords in an ASR task. The integration with a DNN-HMM system enables modeling of multi-frame distributions, which conventional systems find difficult to accomplish. To further improve the performance, more confusable data is generated by boosting the likelihood of the sentences that have more errors. The keyword spotting system is implemented within a weighted finite state transducer (WFST) framework and the DNN is optimized using standard backpropagation and stochastic gradient decent. We evaluate the performance of the proposed framework on a large vocabulary spontaneous conversational telephone speech dataset (Switchboard-1 Release 2). The proposed approach achieves an absolute figure of merit improvement of 3.65% over the baseline system.


DOI: 10.21437/Interspeech.2016-642

Cite as

Meng, Z., Juang, B. (2016) Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting. Proc. Interspeech 2016, 770-774.

Bibtex
@inproceedings{Meng+2016,
author={Zhong Meng and Biing-Hwang Juang},
title={Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-642},
url={http://dx.doi.org/10.21437/Interspeech.2016-642},
pages={770--774}
}