8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Smooth Soft Mel-Spectrographic Masks Based on Blind Sparse Source Separation

Marco Kühne (1), Roberto Togneri (1), Sven Nordholm (2)

(1) University of Western Australia, Australia
(2) WATRI, Australia

This paper investigates the use of DUET, a recently proposed blind source separation method, as front-end for missing data speech recognition. Based on the attenuation and delay estimation in stereo signals soft time-frequency masks are designed to extract a target speaker from a mixture containing multiple speech sources. A postprocessing step is introduced in order to remove isolated mask points that can cause insertion errors in the speech decoder. The results for connected digit experiments in a multi-speaker environment demonstrate that the proposed soft masks closely match the performance of the oracle mask designed with a priori knowledge of the source spectra.

Full Paper

Bibliographic reference.  Kühne, Marco / Togneri, Roberto / Nordholm, Sven (2007): "Smooth soft mel-spectrographic masks based on blind sparse source separation", In INTERSPEECH-2007, 918-921.