EUROSPEECH 2003 - INTERSPEECH 2003
Accurate discrimination between speech and non-speech is an essential part in many tasks of speech processing systems. In this paper an approach to the classification part of a Voice Activity Detector (VAD) is presented. Some possible shortcomings of present VAD-systems are described and a classification approach which overcomes these weaknesses is derived. This approach is based on a Self-Organizing Map (SOM), a neural network, which is able to detect clusters within the feature space of its training data. Training of the classifier takes place in two steps: First the SOM has to be trained. When finished, it is used in the second training step to learn the mapping between its classes and the desired output "speech" resp. "non-speech". Experiments on a database containing audio-samples obtained under different noisy conditions show the potential of the proposed algorithm.
Bibliographic reference. Grashey, Stephan (2003): "A new approach to voice activity detection based on self-organizing maps", In EUROSPEECH-2003, 1733-1736.