INTERSPEECH 2015
16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

A Data-Driven Speech Enhancement Method Based on Modeled Long-Range Temporal Dynamics

Yue Hao, Changchun Bao, Feng Bao, Feng Deng

Beijing University of Technology, China

In this paper, a data-driven speech enhancement method based on modeled long-range temporal dynamics (LRTDs) is proposed. First, given speech and noise corpora, Gaussian Mixture Models (GMMs) of the speech and noise can be trained respectively based on the expectation-maximization (EM) algorithm. Then, the LRTDs are obtained from the GMM models. Next, based on the LRTDs, a noise robustness longest segment searching (NRLSS) method combined with the Vector Taylor Series (VTS) approximation algorithm is adopted to search the longest matching speech and noise segments (LMSNS) from speech and noise corpora. Finally, using the obtained LMSNS, the estimation of speech spectrum is achieved. Furthermore, a modified Wiener filter is constructed to further eliminate residual noise. The test results show that the proposed method outperforms the state-of-the-art speech enhancement methods.

Full Paper

Bibliographic reference.  Hao, Yue / Bao, Changchun / Bao, Feng / Deng, Feng (2015): "A data-driven speech enhancement method based on modeled long-range temporal dynamics", In INTERSPEECH-2015, 1790-1794.