Speech/non-speech sound classification is an important problem in audio diarization, audio document retrieval and advanced human interfaces. The focus of this study is on the development of spectral and temporal acoustic features for speech/non-speech sound classification based on production differences in speech versus whistle. Seven time- and frequency-domain based features are investigated. Performance of the proposed feature set for the task of speech/whistle classification is evaluated at frame level. This evaluation utilizes support vector machine (SVM) models and Gaussian mixture models (GMM) for back-end classifiers. At the frame-level, the proposed front-end fusion gives an absolute performance gain of +15.0% and +3.1% over MFCC with SVM and GMM based classifiers, respectively. This research will benefit the development of intelligent speech interfaces for identification, recognition, and speech coding, as a preprocessing step for real world audio streams.
Bibliographic reference. Nandwana, Mahesh Kumar / Bořil, Hynek / Hansen, John H. L. (2015): "A new front-end for classification of non-speech sounds: a study on human whistle", In INTERSPEECH-2015, 1982-1986.