Machine Listening in Multisource Environments (CHiME) 2011

Florence, Italy
September 1, 2011

Robust Speech Recognition in Multi-Source Noise Environments using Convolutive Non-Negative Matrix Factorization

Ravichander Vipperla, Simon Bozonnet, Dong Wang, Nicholas Evans

Multimedia Communications Department, EURECOM, Sophia Antipolis, France

Convolutive non-negative matrix factorization (CNMF) is an effective approach for supervised audio source separation. It relies on the availability of sufficient training data to learn a set of bases for each acoustic source. For automatic speech recognition (ASR) in a multi-source noise environment, the varied nature of background noise makes it a challenging task to learn the noise bases and thereby to suppress it from the speech signal using CNMF. A large amount of training data is required to reliably capture noise variation, but this generally leads to an unacceptable computational burden. Here, we address this problem by learning the noise bases using a computationally efficient, online CNMF approach. By learning the noise bases from several hours of ambient noise data and over a few seconds of local acoustic context, we show that background noise can be effectively attenuated from noisy speech. ASR accuracies on the CHiME corpus with the denoised speech show relative improvements in the range of 42.3% for -6 dB signal-to-noise ratio (SNR) to 2.5% for 9 dB SNR.

Index Terms. Convolutive non-negative matrix factorization, online CNMF, speech separation, automatic speech recognition

Full Paper

Bibliographic reference.  Vipperla, Ravichander / Bozonnet, Simon / Wang, Dong / Evans, Nicholas (2011): "Robust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization", In CHiME-2011, 74-79.