8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Factorial Models and Refiltering for Speech Separation and Denoising

Sam T. Roweis

University of Toronto, Canada

This paper proposes the combination of several ideas, some old and some new, from machine learning and speech processing. We review the max approximation to log spectrograms of mixtures, show why this motivates a "refiltering" approach to separation and denoising, and then describe how the process of inference in factorial probabilistic models performs a computation useful for deriving the masking signals needed in refiltering. A particularly simple model, factorial-max vector quantization (MAXVQ), is introduced along with a branch-and-bound technique for efficient exact inference and applied to both denoising and monaural separation. Our approach represents a return to the ideas of Ephraim, Varga and Moore but applied to auditory scene analysis rather than to speech recognition.

Full Paper

Bibliographic reference.  Roweis, Sam T. (2003): "Factorial models and refiltering for speech separation and denoising", In EUROSPEECH-2003, 1009-1012.