Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition

David B. Ramsay, Kevin Kilgour, Dominik Roblek, Matthew Sharifi


Low power digital signal processors (DSPs) typically have a very limited amount of memory in which to cache data. In this paper we develop efficient bottleneck feature (BNF) extractors that can be run on a DSP, and retrain a baseline large-vocabulary continuous speech recognition (LVCSR) system to use these BNFs with only a minimal loss of accuracy. The small BNFs allow the DSP chip to cache more audio features while the main application processor is suspended, thereby reducing the overall battery usage. Our presented system is able to reduce the footprint of standard, fixed point DSP spectral features by a factor of 10 without any loss in word error rate (WER) and by a factor of 64 with only a 5.8% relative increase in WER.


 DOI: 10.21437/Interspeech.2019-2193

Cite as: Ramsay, D.B., Kilgour, K., Roblek, D., Sharifi, M. (2019) Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition. Proc. Interspeech 2019, 3456-3459, DOI: 10.21437/Interspeech.2019-2193.


@inproceedings{Ramsay2019,
  author={David B. Ramsay and Kevin Kilgour and Dominik Roblek and Matthew Sharifi},
  title={{Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3456--3459},
  doi={10.21437/Interspeech.2019-2193},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2193}
}