ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

An Experimental Study on Light Speech Features for Small-Footprint Keyword Spotting

Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen

Keyword spotting (KWS) is, in many instances, intended to run on smart electronic devices characterized by limited computational resources. To meet computational constraints, a series of techniques —ranging from feature and acoustic model parameter quantization to the reduction of the number of model parameters and required multiplications— has been explored in the literature. With this same aim, in this paper, we study a straightforward alternative consisting of the reduction of the spectro/cepstro-temporal resolution of log-Mel and Melfrequency cepstral coefficient feature matrices commonly employed in KWS. We show that the feature matrix size has a strong impact on the number of multiplications/energy consumption of a state-of-the-art KWS acoustic model based on convolutional neural network. Experimental results demonstrate that the number of elements in commonly used speech feature matrices can be reduced by a factor of 8 while essentially maintaining KWS performance. Even more interestingly, this size reduction leads to a 9.6× number of multiplications/energy consumption, 4.0× training time and 3.7× inference time reduction.


doi: 10.21437/IberSPEECH.2022-27

Cite as: López-Espejo, I., Tan, Z.-H., Jensen, J. (2022) An Experimental Study on Light Speech Features for Small-Footprint Keyword Spotting . Proc. IberSPEECH 2022, 131-135, doi: 10.21437/IberSPEECH.2022-27

@inproceedings{lopezespejo22_iberspeech,
  author={Iván López-Espejo and Zheng-Hua Tan and Jesper Jensen},
  title={{An Experimental Study on Light Speech Features for Small-Footprint Keyword Spotting }},
  year=2022,
  booktitle={Proc. IberSPEECH 2022},
  pages={131--135},
  doi={10.21437/IberSPEECH.2022-27}
}