In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary. The system is implemented with an end-to-end trained automatic speech recognition (ASR) model and a text-independent speaker verification model. To address the challenge of detecting these keyphrases under various noisy conditions, a speaker separation model is added to the feature frontend of the speaker verification model, and an adaptive noise cancellation (ANC) algorithm is included to exploit cross-microphone noise coherence. Our experiments show that the text-independent speaker verification model largely reduces the false triggering rate of the keyphrase detection, while the speaker separation model and adaptive noise cancellation largely reduce false rejections.
Cite as: Rikhye, R., Wang, Q., Liang, Q., He, Y., Zhao, D., Huang, Y., Narayanan, A., McGraw, I. (2021) Personalized Keyphrase Detection Using Speaker and Environment Information. Proc. Interspeech 2021, 4204-4208, doi: 10.21437/Interspeech.2021-204
@inproceedings{rikhye21_interspeech, author={Rajeev Rikhye and Quan Wang and Qiao Liang and Yanzhang He and Ding Zhao and Yiteng Huang and Arun Narayanan and Ian McGraw}, title={{Personalized Keyphrase Detection Using Speaker and Environment Information}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={4204--4208}, doi={10.21437/Interspeech.2021-204} }