8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

High-level Feature Weighted GMM Network for Audio Stream Classification

Rongqing Huang, John H. L. Hansen

University of Colorado at Boulder, USA

The problem of unsupervised audio classification continuous to be a challenging research problem which significantly impacts ASR and Spoken Document Retrieval (SDR) performance. This paper addresses novel advances in audio classification for speech recognition. A new algorithm is proposed for audio classification, which is based on Weighted GMM Network (WGN). Two new highlevel features: VSF (Variance of the Spectrum Flux) and VZCR (Variance of the Zero-Crossing Rate) are used to pre-classify the audio and supply weights to the output probabilities of the GMM networks. The classification is then implemented using weighted GMM networks. Evaluations on a standard data set --- DARPA Hub4 Broadcast News 1997evaluation data, shows that the WGN classification algorithm achieves over a 50% improvement versus the GMM network baseline algorithm. The WGN also obtains very satisfactory results on the more diverse and challenging NGSW (National Gallery of the Spoken Word) corpus. Classification based on segmentation method is also explored.

Full Paper

Bibliographic reference.  Huang, Rongqing / Hansen, John H. L. (2004): "High-level feature weighted GMM network for audio stream classification", In INTERSPEECH-2004, 1061-1064.