ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

System request detection in human conversation based on multi-resolution Gabor wavelet features

Tomoyuki Yamagata, Tetsuya Takiguchi, Yasuo Ariki

For a hands-free speech interface, it is important to detect commands in spontaneous utterances. Usual voice activity detection systems can only distinguish speech frames from non-speech frames, but they cannot discriminate whether the detected speech section is a command for a system or not. In this paper, in order to analyze the difference between system requests and spontaneous utterances, we focus on fluctuations in a long period, such as prosodic articulation, and fluctuations in a short period, such as phoneme articulation. The use of multi-resolution analysis using Gabor wavelet on a Log-scale Mel-frequency Filter-bank clarifies the different characteristics of system commands and spontaneous utterances. Experiments using our robot dialog corpus show that the accuracy of the proposed method is 92.6% in F-measure, while the conventional power and prosody-based method is just 66.7%.


doi: 10.21437/Interspeech.2009-89

Cite as: Yamagata, T., Takiguchi, T., Ariki, Y. (2009) System request detection in human conversation based on multi-resolution Gabor wavelet features. Proc. Interspeech 2009, 256-259, doi: 10.21437/Interspeech.2009-89

@inproceedings{yamagata09_interspeech,
  author={Tomoyuki Yamagata and Tetsuya Takiguchi and Yasuo Ariki},
  title={{System request detection in human conversation based on multi-resolution Gabor wavelet features}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={256--259},
  doi={10.21437/Interspeech.2009-89}
}