We present two approaches on acoustic event detection for speechenabled car applications: a generative GMM-UBM approach and a discriminative GMM-SVM supervector approach. The systems detect whether or not a certain acoustic event occurred while the built-in microphone of the car was active to record a spoken command, either before, while, or after the driver was speaking. These events can be music playing, phone ringing, a passenger different from the driver is talking, laughing, or coughing. The task is formally defined as a detection task along the lines of well established detection tasks such as speaker recognition or language recognition. Similarly, the evaluation procedure has been designed to resemble the respective official evaluation series performed by NIST (i.e. it was a blind 'one-shot' evaluation on a separately provided dataset). The performance of the system was calculated in terms of detection miss and false alarm probabilities (CMiss = CFA = 1, and PTarget = 0.5). The performance of the superior GMM-SVM system was 0.0345 for known test speakers and 0.1955 for novel test speakers. Frequency-filtered band energy coefficients (FFBE) outperformed MFCCS on that task. The results are promising and suggest further experiments on more data.
Bibliographic reference. Müller, Christian / Biel, Joan-Isaac / Kim, Edward / Rosario, Daniel (2008): "Speech-overlapped acoustic event detection for automotive applications", In INTERSPEECH-2008, 2590-2593.