INTERSPEECH 2004 - ICSLP
In this paper, we introduce a new framework for speech detection using convolutional networks. We propose a network architecture that can incorporate long and short-term temporal and spectral correlations of speech in the detection process. The proposed design is able to address many shortcomings of existing speech detectors in a unified new framework: First, it improves the robustness of the system to environmental variability while still being fast to evaluate. Second, it allows for a framework that is extendable to work under different time-scales for different applications. Finally, it is discriminative and produces reliable estimates of the probability of presence of speech in each frame for a wide variety of noise conditions. We propose that the inputs to the system be features that are measures of the true signal-to-noise ratio of a set of frequency bands of the signal. These can be easily and automatically generated by tracking the noise spectrum online. We present preliminary results on the AURORA database to demonstrate the effectiveness of the detector over conventional Gaussian detectors.
Bibliographic reference. Sukittanon, Somsak / Surendran, Arun C. / Platt, John C. / Burges, Chris J.C. (2004): "Convolutional networks for speech detection", In INTERSPEECH-2004, 1077-1080.