This paper presents a new method of zero-crossing based binaural mask estimation for sound segregation under the condition that multiple sound sources are present simultaneously. The masking is determined by the estimated sound source directions using the spatial cues such as inter-aural time differences (ITDs) and inter-aural intensity differences (IIDs). In the suggested method, the estimation of ITDs is utilizing the statistical properties of zero-crossings detected from binaural filter-bank outputs. We also consider the estimation of ITDs with the aid of IID samples to cope with the phase ambiguities of ITD samples in high frequencies. For the masking method, we consider to use the target-to-total power ratio in each segment of the time-frequency domain. We show that this power ratio is optimal from the view point of reconstructing the target speech signal. As a result, the proposed method is able to provide an accurate estimate of sound source directions and also a good masking scheme for speech segregation while offering significantly less computational complexity compared to cross-correlation-based methods.
Bibliographic reference. An, Sung Jun / Kim, Young-Ik / Kil, Rhee Man (2007): "Zero-crossing-based ratio masking for sound segregation", In INTERSPEECH-2007, 1945-1948.