In this paper, a data-driven speech enhancement method based on modeled long-range temporal dynamics (LRTDs) is proposed. First, given speech and noise corpora, Gaussian Mixture Models (GMMs) of the speech and noise can be trained respectively based on the expectation-maximization (EM) algorithm. Then, the LRTDs are obtained from the GMM models. Next, based on the LRTDs, a noise robustness longest segment searching (NRLSS) method combined with the Vector Taylor Series (VTS) approximation algorithm is adopted to search the longest matching speech and noise segments (LMSNS) from speech and noise corpora. Finally, using the obtained LMSNS, the estimation of speech spectrum is achieved. Furthermore, a modified Wiener filter is constructed to further eliminate residual noise. The test results show that the proposed method outperforms the state-of-the-art speech enhancement methods.
Bibliographic reference. Hao, Yue / Bao, Changchun / Bao, Feng / Deng, Feng (2015): "A data-driven speech enhancement method based on modeled long-range temporal dynamics", In INTERSPEECH-2015, 1790-1794.