We propose a new catalog-based speech-music separation method for background music removal. Assuming that we know a catalog of the background music, we develop a generative model for the superposed speech and music spectrograms. We represent the speech spectrogram by a Non-negative Matrix Factorization (NMF) model and the music spectrogram by a conditional Poisson Mixture Model (PMM). By choosing the size of the catalog, i.e., the number of mixture components we can tradeoff speed versus accuracy. The combined hierarchical model leads to a mixture of multinomial distributions as the joint posterior of music and speech. Separation and hyper-parameter adaptation can be achieved via an Expectation Maximization algorithm. Experimental results show that separation performance of the algorithm is promising. Furthermore, we show that incorporating prior information such as volume adjustment parameter can enhance the separation performance.
Bibliographic reference. Demir, Cemil / Cemgil, A. Taylan / Saraçlar, Murat (2010): "Catalog-based single-channel speech-music separation", In INTERSPEECH-2010, 2782-2785.