Machine Listening in Multisource Environments (CHiME) 2011

Florence, Italy
September 1, 2011

The Munich 2011 CHiME Challenge Contribution: NMF-BLSTM Speech Enhancement and Recognition for Reverberated Multisource Environments

Felix Weninger, Jürgen Geiger, Martin Wöllmer, Björn Schuller, Gerhard Rigoll

Institute for Human-Machine Communication, Technische Universität München, Germany

We present the Munich contribution to the PASCAL ‘CHiME’ Speech Separation and Recognition Challenge: Our approach combines source separation by supervised convolutive non-negative matrix factorisation (NMF) with our tandem recogniser that augments acoustic features by word predictions of a Long Short-Term Memory recurrent neural network in a multi-stream Hidden Markov Model. The performance of our source separation approach is demonstrated in a sequence of gradually refined speech recognisers. While NMF drastically improves performance for all investigated recognisers, best results are obtained with the multi-stream approach along with a novel adaptation technique for noise dictionaries in supervised NMF. On the final Challenge test set, the proposed system delivers an average keyword recognition accuracy of 87.86% across SNRs ranging from -6 to 9 dB, reducing the error rate from 44% to 12% compared to the Challenge baseline.

Index Terms. Non-Negative Matrix Factorisation, Tandem Speech Recognition

Full Paper     Slides

Bibliographic reference.  Weninger, Felix / Geiger, Jürgen / Wöllmer, Martin / Schuller, Björn / Rigoll, Gerhard (2011): "The Munich 2011 CHiME challenge contribution: NMF-BLSTM speech enhancement and recognition for reverberated multisource environments", In CHiME-2011, 24-29.