A Mask Estimation Method Integrating Data Field Model for Speech Enhancement

Xianyun Wang, Changchun Bao, Feng Bao

In most approaches based on computational auditory scene analysis (CASA), the ideal binary mask (IBM) is often used for noise reduction. However, it is almost impossible to obtain the IBM result. The error in IBM estimation may greatly violate smooth evolution nature of speech because of the energy absence in many speech-dominated time-frequency (T-F) units. To reduce the error, the ideal ratio mask (IRM) via modeling the spatial dependencies of speech spectrum is used as an optimal target mask because the predictive ratio mask is less sensitive to the error than the predictive binary mask. In this paper, we introduce a data field (DF) to model the spatial dependencies of the cochleagram for obtaining the ratio mask. Firstly, initial T-F units of noise and speech are obtained from noisy speech. Then we can calculate the forms of the potentials of noise and speech. Subsequently, their optimal potentials which reflect their respective distribution of potential field are obtained by the optimal influence factors of speech and noise. Finally, we exploit the potentials of speech and noise to obtain the ratio mask. Experimental results show that the proposed method can obtain a better performance than the reference methods in speech quality.

 DOI: 10.21437/Interspeech.2017-271

Cite as: Wang, X., Bao, C., Bao, F. (2017) A Mask Estimation Method Integrating Data Field Model for Speech Enhancement. Proc. Interspeech 2017, 1904-1908, DOI: 10.21437/Interspeech.2017-271.

  author={Xianyun Wang and Changchun Bao and Feng Bao},
  title={A Mask Estimation Method Integrating Data Field Model for Speech Enhancement},
  booktitle={Proc. Interspeech 2017},