Small Footprint Multi-channel Keyword Spotting

Jilong Wu, Yiteng Huang, Hyun-Jin Park, Niranjan Subrahmanya, Patrick Violette


Noise robustness remains a challenging problem in on-device keyword spotting. Using multiple-microphone algorithms like beamforming improves accuracy, but it inevitably pushes up computational complexity and tends to require more memory. In this paper, we propose a new neural-network based architecture which takes multiple microphone signals as inputs. It can achieve better accuracy and incurs just a minimum increase in model size. Compared with a single-channel baseline which runs in parallel on each channel, the proposed architecture reduces the false reject (FR) rate by 36.3% and 46.4% relative on dual-microphone clean and noisy test sets, respectively, at a fixed false accept rate.


 DOI: 10.21437/Odyssey.2020-55

Cite as: Wu, J., Huang, Y., Park, H., Subrahmanya, N., Violette, P. (2020) Small Footprint Multi-channel Keyword Spotting. Proc. Odyssey 2020 The Speaker and Language Recognition Workshop, 391-395, DOI: 10.21437/Odyssey.2020-55.


@inproceedings{Wu2020,
  author={Jilong Wu and Yiteng Huang and Hyun-Jin Park and Niranjan Subrahmanya and Patrick Violette},
  title={{Small Footprint Multi-channel Keyword Spotting}},
  year=2020,
  booktitle={Proc. Odyssey 2020 The Speaker and Language Recognition Workshop},
  pages={391--395},
  doi={10.21437/Odyssey.2020-55},
  url={http://dx.doi.org/10.21437/Odyssey.2020-55}
}