EUROSPEECH 2003 - INTERSPEECH 2003
This paper proposes the combination of several ideas, some old and some new, from machine learning and speech processing. We review the max approximation to log spectrograms of mixtures, show why this motivates a "refiltering" approach to separation and denoising, and then describe how the process of inference in factorial probabilistic models performs a computation useful for deriving the masking signals needed in refiltering. A particularly simple model, factorial-max vector quantization (MAXVQ), is introduced along with a branch-and-bound technique for efficient exact inference and applied to both denoising and monaural separation. Our approach represents a return to the ideas of Ephraim, Varga and Moore but applied to auditory scene analysis rather than to speech recognition.
Bibliographic reference. Roweis, Sam T. (2003): "Factorial models and refiltering for speech separation and denoising", In EUROSPEECH-2003, 1009-1012.