This paper investigates the use of DUET, a recently proposed blind source separation method, as front-end for missing data speech recognition. Based on the attenuation and delay estimation in stereo signals soft time-frequency masks are designed to extract a target speaker from a mixture containing multiple speech sources. A postprocessing step is introduced in order to remove isolated mask points that can cause insertion errors in the speech decoder. The results for connected digit experiments in a multi-speaker environment demonstrate that the proposed soft masks closely match the performance of the oracle mask designed with a priori knowledge of the source spectra.
Bibliographic reference. Kühne, Marco / Togneri, Roberto / Nordholm, Sven (2007): "Smooth soft mel-spectrographic masks based on blind sparse source separation", In INTERSPEECH-2007, 918-921.