A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting

Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Chenghao Zhao, Cunhang Fan


Keyword spotting requires a small memory footprint to run on mobile devices. However, previous works still use several hundred thousand parameters to achieve good performance. To address this issue, we propose a time delay neural network with shared weight self-attention for small-footprint keyword spotting. By sharing weights, the parameters of self-attention are reduced but without performance reduction. The publicly available Google Speech Commands dataset is used to evaluate the models. The number of parameters (12K) of our model is 1/20 of state-of-the-art ResNet model (239K). The proposed model achieves an error rate of 4.19% , which is comparable to the ResNet model.


 DOI: 10.21437/Interspeech.2019-1676

Cite as: Bai, Y., Yi, J., Tao, J., Wen, Z., Tian, Z., Zhao, C., Fan, C. (2019) A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting. Proc. Interspeech 2019, 2190-2194, DOI: 10.21437/Interspeech.2019-1676.


@inproceedings{Bai2019,
  author={Ye Bai and Jiangyan Yi and Jianhua Tao and Zhengqi Wen and Zhengkun Tian and Chenghao Zhao and Cunhang Fan},
  title={{A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2190--2194},
  doi={10.21437/Interspeech.2019-1676},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1676}
}