Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection

Daisuke Kaneko, Ryota Konno, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh


The detection of out-of-vocabulary (OOV) query terms is a crucial problem in spoken term detection (STD), because OOV query terms are likely. To enable search of OOV query terms in STD systems, a query subword sequence is compared with subword sequences generated using an automatic speech recognizer against spoken documents. When comparing two subword sequences, the edit distance is a typical distance between any two subwords. We previously proposed an acoustic distance defined from statistics between states of the hidden Markov model (HMM) and showed its effectiveness in STD [4]. This paper proposes an acoustic distance between subwords and HMM states where the posterior probabilities output by a deep neural network are used to improve the STD accuracy for OOV query terms. Experiments are conducted to evaluate the performance of the proposed method, using the open test collections for the “Spoken&Doc” tasks of the NTCIR-9 [13] and NTCIR-10 [14] workshops. The proposed method shows improvements in mean average precision.


 DOI: 10.21437/Interspeech.2017-634

Cite as: Kaneko, D., Konno, R., Kojima, K., Tanaka, K., Lee, S., Itoh, Y. (2017) Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection. Proc. Interspeech 2017, 2879-2883, DOI: 10.21437/Interspeech.2017-634.


@inproceedings{Kaneko2017,
  author={Daisuke Kaneko and Ryota Konno and Kazunori Kojima and Kazuyo Tanaka and Shi-wook Lee and Yoshiaki Itoh},
  title={Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2879--2883},
  doi={10.21437/Interspeech.2017-634},
  url={http://dx.doi.org/10.21437/Interspeech.2017-634}
}