Model Compression Applied to Small-Footprint Keyword Spotting

George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni


Several consumer speech devices feature voice interfaces that perform on-device keyword spotting to initiate user interactions. Accurate on-device keyword spotting within a tight CPU budget is crucial for such devices. Motivated by this, we investigated two ways to improve deep neural network (DNN) acoustic models for keyword spotting without increasing CPU usage. First, we used low-rank weight matrices throughout the DNN. This allowed us to increase representational power by increasing the number of hidden nodes per layer without changing the total number of multiplications. Second, we used knowledge distilled from an ensemble of much larger DNNs used only during training. We systematically evaluated these two approaches on a massive corpus of far-field utterances. Alone both techniques improve performance and together they combine to give significant reductions in false alarms and misses without increasing CPU or memory usage.


DOI: 10.21437/Interspeech.2016-1393

Cite as

Tucker, G., Wu, M., Sun, M., Panchapagesan, S., Fu, G., Vitaladevuni, S. (2016) Model Compression Applied to Small-Footprint Keyword Spotting. Proc. Interspeech 2016, 1878-1882.

Bibtex
@inproceedings{Tucker+2016,
author={George Tucker and Minhua Wu and Ming Sun and Sankaran Panchapagesan and Gengshen Fu and Shiv Vitaladevuni},
title={Model Compression Applied to Small-Footprint Keyword Spotting},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1393},
url={http://dx.doi.org/10.21437/Interspeech.2016-1393},
pages={1878--1882}
}