12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Adaptation of Speaker-Specific Bases in Non-Negative Matrix Factorization for Single Channel Speech-Music Separation

Emad M. Grais, Hakan Erdogan

Sabancı Üniversitesi, Turkey

This paper introduces a speaker adaptation algorithm for nonnegative matrix factorization (NMF) models. The proposed adaptation algorithm is a combination of Bayesian and subspace model adaptation. The adapted model is used to separate speech signal from a background music signal in a single record. Training speech data for multiple speakers is used with NMF to train a set of basis vectors as a general model for speech signals. The probabilistic interpretation of NMF is used to achieve Bayesian adaptation to adjust the general model with respect to the actual properties of the speech signals that is observed in the mixed signal. The Bayesian adapted model is adapted again by a linear transform, which changes the subspace that the Bayesian adapted model spans to better match the speech signal that is in the mixed signal. The experimental results show that combining Bayesian with linear transform adaptation improves the separation results.

Full Paper

Bibliographic reference.  Grais, Emad M. / Erdogan, Hakan (2011): "Adaptation of speaker-specific bases in non-negative matrix factorization for single channel speech-music separation", In INTERSPEECH-2011, 569-572.