SAPA-SCALE Conference 2012

Portland, OR, USA
September 7-8, 2012

Multi-Channel Speech Separation with Soft Time-Frequency Masking

Rahil Mahdian Toroghi, Friedrich Faubel, Dietrich Klakow

Spoken Language Systems, Saarland University, Saarbrücken, Germany

This paper addresses the problem of separating concurrent speech through a spatial filtering stage and a subsequent time-frequency masking stage. These stages complement each other by first exploiting the spatial diversity and then making use of the fact that different speech signals rarely occupy the same frequency bins at a time. The novelty of the paper consists in the use of auditorymotivated log-sigmoid masks, whose scale parameters are optimized to maximize the kurtosis of the separated speech. Experiments on the Pascal Speech Separation Challenge II show significant improvements compared to previous approaches with binary masks.

Index Terms: speech recognition, microphone arrays, time-frequency masking, kurtosis maximization

Full Paper

Bibliographic reference.  Toroghi, Rahil Mahdian / Faubel, Friedrich / Klakow, Dietrich (2012): "Multi-channel speech separation with soft time-frequency masking", In SAPA-SCALE-2012, 86-91.