Missing feature theory (MFT) has been proposed to effectively improve speaker recognition performance in noisy environments. For MFTbased speaker recognition, the binary mask is required to identify those reliable and unreliable feature components. In this paper, a dualmicrophone based semi-blind Degenerate Unmixing Estimation Technique (DUET) approach is proposed to estimate the binary mask. Using the spatial information instead of the conventional statistics of noises, our proposed approach has a good mask estimation, especially when the noises are non-stationary, e.g., interfering speech or music. Experimental results show that the proposed method achieve significant improvements over alternative approaches. We further refine the estimated binary mask by removing the unreliable time frames and nondiscriminate frequency subbands. Experiments demonstrate that the refined binary mask enhances the performance of MFT-based speaker verification, and represents a promising dire ction for MFT-based applications.
Index Terms: speaker verification, missing feature theory, dual-microphone, binary mask estimation
Bibliographic reference. Zhao, Yali / Xie, Lie / Fu, Zhonghua (2012): "Mask estimation and refinement for MFT-based robust speaker verification", In INTERSPEECH-2012, 2654-2657.