INTERSPEECH 2012

In this paper, we attempt to generalize the ideal binary mask (IBM) estimation to the ideal ratio mask (IRM) estimation. Under binary masking, the error in IBM estimation may greatly distort the original speech spectrum. The main purpose of this paper is using ratio mask to smooth this negative impact. Since the key issue is the noise tracking, we firstly use exponential distributions to model the distribution of noise power with binary mask and mixture power as condition. Then, we use a Gaussian distribution to model the correlation of noise estimation between adjacent TF units. As the IBM of majority units can be estimated correctly, the correlation model could reduce the impact introduced by the error in IBM estimation. Systematic experiments show that our algorithm outperforms a common binary masking based method in terms of SNR gain and PESQ scores.
Index Terms: Ideal Binary Mask, Ideal Ratio Mask, Markov Chain Monte Carlo, Bayesian rule
Bibliographic reference. Liang, Shan / Jiang, Wei / Liu, Wenju (2012): "A new noisetracking algorithm for generalizing binary timefrequency (tf) masking to ratio masking", In INTERSPEECH2012, 951954.