For a hands-free speech interface, it is important to detect commands in spontaneous utterances. Usual voice activity detection systems can only distinguish speech frames from non-speech frames, but they cannot discriminate whether the detected speech section is a command for a system or not. In this paper, in order to analyze the difference between system requests and spontaneous utterances, we focus on fluctuations in a long period, such as prosodic articulation, and fluctuations in a short period, such as phoneme articulation. The use of multi-resolution analysis using Gabor wavelet on a Log-scale Mel-frequency Filter-bank clarifies the different characteristics of system commands and spontaneous utterances. Experiments using our robot dialog corpus show that the accuracy of the proposed method is 92.6% in F-measure, while the conventional power and prosody-based method is just 66.7%.
Bibliographic reference. Yamagata, Tomoyuki / Takiguchi, Tetsuya / Ariki, Yasuo (2009): "System request detection in human conversation based on multi-resolution Gabor wavelet features", In INTERSPEECH-2009, 256-259.