SAPA-SCALE Conference 2012
Portland, OR, USA
This paper addresses the problem of separating concurrent speech through a spatial filtering stage and a subsequent time-frequency masking stage. These stages complement each other by first exploiting the spatial diversity and then making use of the fact that different speech signals rarely occupy the same frequency bins at a time. The novelty of the paper consists in the use of auditorymotivated log-sigmoid masks, whose scale parameters are optimized to maximize the kurtosis of the separated speech. Experiments on the Pascal Speech Separation Challenge II show significant improvements compared to previous approaches with binary masks.
Index Terms: speech recognition, microphone arrays, time-frequency masking, kurtosis maximization
Bibliographic reference. Toroghi, Rahil Mahdian / Faubel, Friedrich / Klakow, Dietrich (2012): "Multi-channel speech separation with soft time-frequency masking", In SAPA-SCALE-2012, 86-91.