We have developed a method to determine whether a user utterance is directed at the system or not. A spoken dialogue system should not respond to audio inputs that are not directed at it (i.e., a userfs mutter), and it therefore needs to detect such inputs to avoid unsuitable responses. We classify the two cases by logistic regression based on a feature set including utterance timing, utterance length, and dialogue status. We conducted experiments using 5395 user utterances for both transcription and automatic speech recognition results. Results showed that the classification accuracy improved by 11.0 and 4.1 points, respectively. We also discuss which features are effective in the classification.
Index Terms: spoken dialogue system, system-directed utterance, utterance timing
Bibliographic reference. Komatani, Kazunori / Hirano, Akira / Nakano, Mikio (2012): "Detecting system-directed utterances using dialogue-level features", In INTERSPEECH-2012, 230-233.