INTERSPEECH 2004 - ICSLP
To realize a robust spoken dialogue system for use in a real environment, the robust rejection of unintended inputs such as laughter, coughing, background speech and other noise based on GMM is implemented and examined on the basis of actual utterances. All the triggered inputs to a speech-oriented guidance system from 125 days of field tests in a public space are collected, and the occurrence of unintended inputs is investigated. GMM classifiers for voice categories (adult speech and child speech) and non-voice categories (laughter, coughing and other noises) are trained on the basis of the analysis result. The rejection performance of unintended speech was experimented on actual uncontrolled real inputs, and an EER of 3.32% was achieved by the 5-class GMM, which outperforms simple 2-class (voice / non-voice) GMM. The rejection of background speech using GMM is also investigated.
Bibliographic reference. Lee, Akinobu / Nakamura, Keisuke / Nisimura, Ryuichi / Saruwatari, Hiroshi / Shikano, Kiyohiro (2004): "Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs", In INTERSPEECH-2004, 173-176.