In this work, we adopt an information theoretic approach - the Information Bottleneck method - to extract the relevant spectro-temporal modulations for the task of speech / non-speech discrimination - non-speech events include music, noise and animal vocalizations. A compact representation (a "cluster prototype") is built for each class consisting of the maximally informative features with respect to the classification task. We assess the similarity of a sound to each representative cluster using the spectro-temporal modulation index (STMI) adapted to handle the contribution of different frequency bands. A simple threshold check is then used for discriminating speech from non-speech events. Conducted experiments have shown that the proposed method has low complexity and high accuracy of discrimination in low SNR conditions compared to recently proposed methods for the same task.
Bibliographic reference. Markaki, Maria / Wohlmayr, Michael / Stylianou, Yannis (2007): "Speech-nonspeech discrimination using the information bottleneck method and spectro-temporal modulation index", In INTERSPEECH-2007, 2913-2916.