This paper proposes to use non-negative matrix factorization based speech enhancement in robust automatic recognition of mixtures of speech and music. We represent magnitude spectra of noisy speech signals as the non-negative weighted linear combination of speech and noise spectral basis vectors, that are obtained from training corpora of speech and music. We use overcomplete dictionaries consisting of random exemplars of the training data. The method is tested on the Wall Street Journal large vocabulary speech corpus which is artificially corrupted with polyphonic music from the RWC music database. Various music styles and speech-to-music ratios are evaluated. The proposed methods are shown to produce a consistent, significant improvement on the recognition performance in the comparison with the baseline method.
Bibliographic reference. Raj, Bhiksha / Virtanen, Tuomas / Chaudhuri, Sourish / Singh, Rita (2010): "Non-negative matrix factorization based compensation of music for automatic speech recognition", In INTERSPEECH-2010, 717-720.