![]() |
Machine Listening in Multisource Environments (CHiME) 2011Florence, Italy |
![]() |
We present the Munich contribution to the PASCAL CHiME Speech Separation and Recognition Challenge: Our approach combines source separation by supervised convolutive non-negative matrix factorisation (NMF) with our tandem recogniser that augments acoustic features by word predictions of a Long Short-Term Memory recurrent neural network in a multi-stream Hidden Markov Model. The performance of our source separation approach is demonstrated in a sequence of gradually refined speech recognisers. While NMF drastically improves performance for all investigated recognisers, best results are obtained with the multi-stream approach along with a novel adaptation technique for noise dictionaries in supervised NMF. On the final Challenge test set, the proposed system delivers an average keyword recognition accuracy of 87.86% across SNRs ranging from -6 to 9 dB, reducing the error rate from 44% to 12% compared to the Challenge baseline.
Index Terms. Non-Negative Matrix Factorisation, Tandem Speech Recognition
Bibliographic reference. Weninger, Felix / Geiger, Jürgen / Wöllmer, Martin / Schuller, Björn / Rigoll, Gerhard (2011): "The Munich 2011 CHiME challenge contribution: NMF-BLSTM speech enhancement and recognition for reverberated multisource environments", In CHiME-2011, 24-29.