We introduce a new class of speech processing, called Intentional Voice Command Detection (IVCD). It is necessary to reject not only noises but also unintended voices to achieve completely hands-free speech interface. Conventional VAD framework is not sufficient for such purpose, and we discuss how we should define IVCD and how we can realize it. We investigate implementation of IVCD from the viewpoint of feature extraction and classification, and show that the combination of various features and SVM can achieve IVCD accuracy of 93.2% for a large-scale audio database in real home environments.
Bibliographic reference. Obuchi, Yasunari / Togami, Masahito / Sumiyoshi, Takashi (2008): "Intentional voice command detection for completely hands-free speech interface in home environments", In INTERSPEECH-2008, 119-122.