SAPA-SCALE Conference 2012

Portland, OR, USA
September 7-8, 2012

Log-Normal Matrix Factorization with Application to Speech-Music Separation

Takuya Yoshioka, Daichi Sakaue

NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan

This paper proposes a novel spectrogram factorization method, called log-normal matrix factorization (LogNMF). Conventional nonnegative matrix factorization (NMF) methods cannot efficiently capture random properties of actual spectra because these methods assume that speech and noise spectrograms can be precisely represented by combining a small number of temporally invariant spectral patterns, called basis vectors. This limitation results in unsatisfactory performance when NMF is used for speech enhancement. The proposed method overcomes this limitation by allowing each basis vector to change randomly at each time frame with a log-normal distribution. The use of the log-normal distribution is also desirable in that the degree of divergence between an observed spectrogram and a spectrogram model is measured based on squared errors of log power spectra, which are subjectively meaningful. Experimental results show that LogNMF is able to separate speech signals from background music signals more precisely than NMF.

Index Terms: matrix factorization, log-normal distribution, speech enhancement

Full Paper

Bibliographic reference.  Yoshioka, Takuya / Sakaue, Daichi (2012): "Log-normal matrix factorization with application to speech-music separation", In SAPA-SCALE-2012, 80-85.