For a hands-free speech interface, it is important to detect commands in spontaneous utterances. To discriminate commands from human-human conversations by acoustic features, it is efficient to consider the head and the tail of an utterance. The different characteristics of system requests and spontaneous utterances appear on these parts of an utterance. Experiment shows that by separating the head and the tail of an utterance, the accuracy of detection was improved. And also, considering the alternation of speakers using two channel microphones improved the performance. Although detecting system requests using linguistic features shows high accuracy, combining acoustic and turn-taking features lift up the performance.
Bibliographic reference. Yamagata, Tomoyuki / Sako, Atsushi / Takiguchi, Tetsuya / Ariki, Yasuo (2007): "System request detection in conversation based on acoustic and speaker alternation features", In INTERSPEECH-2007, 2789-2792.