11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Non-Negative Matrix Factorization Based Compensation of Music for Automatic Speech Recognition

Bhiksha Raj (1), Tuomas Virtanen (2), Sourish Chaudhuri (1), Rita Singh (1)

(1) Carnegie Mellon University, USA
(2) Tampere University of Technology, Finland

This paper proposes to use non-negative matrix factorization based speech enhancement in robust automatic recognition of mixtures of speech and music. We represent magnitude spectra of noisy speech signals as the non-negative weighted linear combination of speech and noise spectral basis vectors, that are obtained from training corpora of speech and music. We use overcomplete dictionaries consisting of random exemplars of the training data. The method is tested on the Wall Street Journal large vocabulary speech corpus which is artificially corrupted with polyphonic music from the RWC music database. Various music styles and speech-to-music ratios are evaluated. The proposed methods are shown to produce a consistent, significant improvement on the recognition performance in the comparison with the baseline method.

Full Paper

Bibliographic reference.  Raj, Bhiksha / Virtanen, Tuomas / Chaudhuri, Sourish / Singh, Rita (2010): "Non-negative matrix factorization based compensation of music for automatic speech recognition", In INTERSPEECH-2010, 717-720.