Small-Footprint Magic Word Detection Method Using Convolutional LSTM Neural Network

Taiki Yamamoto, Ryota Nishimura, Masayuki Misaki, Norihide Kitaoka


The number of consumer devices which can be operated by voice is increasing every year. Magic Word Detection (MWD), the detection of an activation keyword in continuous speech, has become an essential technology for the hands-free operation of such devices. Because MWD systems need to run constantly in order to detect Magic Words at any time, many studies have focused on the development of a small-footprint system. In this paper, we propose a novel, small-footprint MWD method which uses a convolutional Long Short-Term Memory (LSTM) neural network to capture frequency and time domain features over time. As a result, the proposed method outperforms the baseline method while reducing the number of parameters by more than 80%. An experiment on a small-scale device demonstrates that our model is efficient enough to function in real time.


 DOI: 10.21437/Interspeech.2019-1662

Cite as: Yamamoto, T., Nishimura, R., Misaki, M., Kitaoka, N. (2019) Small-Footprint Magic Word Detection Method Using Convolutional LSTM Neural Network. Proc. Interspeech 2019, 2035-2039, DOI: 10.21437/Interspeech.2019-1662.


@inproceedings{Yamamoto2019,
  author={Taiki Yamamoto and Ryota Nishimura and Masayuki Misaki and Norihide Kitaoka},
  title={{Small-Footprint Magic Word Detection Method Using Convolutional LSTM Neural Network}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2035--2039},
  doi={10.21437/Interspeech.2019-1662},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1662}
}