Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Single-Channel Speech Separation Using Sparse Non-Negative Matrix Factorization

Mikkel N. Schmidt, Rasmus K. Olsson

Technical University of Denmark, Denmark

We apply machine learning techniques to the problem of separating multiple speech sources from a single microphone recording. The method of choice is a sparse non-negative matrix factorization algorithm, which in an unsupervised manner can learn sparse representations of the data. This is applied to the learning of personalized dictionaries from a speech corpus, which in turn are used to separate the audio stream into its components. We show that computational savings can be achieved by segmenting the training data on a phoneme level. To split the data, a conventional speech recognizer is used. The performance of the unsupervised and supervised adaptation schemes result in significant improvements in terms of the target-to-masker ratio.

Full Paper

Bibliographic reference.  Schmidt, Mikkel N. / Olsson, Rasmus K. (2006): "Single-channel speech separation using sparse non-negative matrix factorization", In INTERSPEECH-2006, paper 1652-Thu2FoP.10.