Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC

Yimeng Zhuang, Xuankai Chang, Yanmin Qian, Kai Yu


Keyword spotting (KWS) aims to detect predefined keywords in continuous speech. Recently, direct deep learning approaches have been used for KWS and achieved great success. However, these approaches mostly assume fixed keyword vocabulary and require significant retraining efforts if new keywords are to be detected. For unrestricted vocabulary, HMM based keyword-filler framework is still the mainstream technique. In this paper, a novel deep learning approach is proposed for unrestricted vocabulary KWS based on Connectionist Temporal Classification (CTC) with Long Short-Term Memory (LSTM). Here, an LSTM is trained to discriminant phones with the CTC criterion. During KWS, an arbitrary keyword can be specified and it is represented by one or more phone sequences. Due to the property of peaky phone posteriors of CTC, the LSTM can produce a phone lattice. Then, a fast substring matching algorithm based on minimum edit distance is used to search the keyword phone sequence on the phone lattice. The approach is highly efficient and vocabulary independent. Experiments showed that the proposed approach can achieve significantly better results compared to a DNN-HMM based keyword-filler decoding system. In addition, the proposed approach is also more efficient than the DNN-HMM KWS baseline.


DOI: 10.21437/Interspeech.2016-753

Cite as

Zhuang, Y., Chang, X., Qian, Y., Yu, K. (2016) Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC. Proc. Interspeech 2016, 938-942.

Bibtex
@inproceedings{Zhuang+2016,
author={Yimeng Zhuang and Xuankai Chang and Yanmin Qian and Kai Yu},
title={Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-753},
url={http://dx.doi.org/10.21437/Interspeech.2016-753},
pages={938--942}
}