Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting

Ming Sun, David Snyder, Yixin Gao, Varun Nagaraja, Mike Rodehorst, Sankaran Panchapagesan, Nikko Strom, Spyros Matsoukas, Shiv Vitaladevuni


In this paper we investigate a time delay neural network (TDNN) for a keyword spotting task that requires low CPU, memory and latency. The TDNN is trained with transfer learning and multi-task learning. Temporal subsampling enabled by the time delay architecture reduces computational complexity. We propose to apply singular value decomposition (SVD) to further reduce TDNN complexity. This allows us to first train a larger full-rank TDNN model which is not limited by CPU/memory constraints. The larger TDNN usually achieves better performance. Afterwards, its size can be compressed by SVD to meet the budget requirements. Hidden Markov models (HMM) are used in conjunction with the networks to perform keyword detection and performance is measured in terms of area under the curve (AUC) for detection error tradeoff (DET) curves. Our experimental results on a large in-house far-field corpus show that the full-rank TDNN achieves a 19.7% DET AUC reduction compared to a similar-size deep neural network (DNN) baseline. If we train a larger size full-rank TDNN first and then reduce it via SVD to the comparable size of the DNN, we obtain a 37.6% reduction in DET AUC compared to the DNN baseline.


 DOI: 10.21437/Interspeech.2017-480

Cite as: Sun, M., Snyder, D., Gao, Y., Nagaraja, V., Rodehorst, M., Panchapagesan, S., Strom, N., Matsoukas, S., Vitaladevuni, S. (2017) Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting. Proc. Interspeech 2017, 3607-3611, DOI: 10.21437/Interspeech.2017-480.


@inproceedings{Sun2017,
  author={Ming Sun and David Snyder and Yixin Gao and Varun Nagaraja and Mike Rodehorst and Sankaran Panchapagesan and Nikko Strom and Spyros Matsoukas and Shiv Vitaladevuni},
  title={Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3607--3611},
  doi={10.21437/Interspeech.2017-480},
  url={http://dx.doi.org/10.21437/Interspeech.2017-480}
}