15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Multi-Channel Speech Enhancement Using Sparse Coding on Local Time-Frequency Structures

Zhiyuan Zhou (1), Zhaogui Ding (1), Weifeng Li (1), Zhiyong Wu (1), Longbiao Wang (2), Qingmin Liao (1)

(1) Tsinghua University, China
(2) Nagaoka University of Technology, Japan

A novel multi-channel speech enhancement technique is proposed in the present paper. We focus on the local sparsities of speech signals in contrast to the conventional beamforming and blind source seperation methods. The technique utilizes the difference of local structures in temporary-frequency domain between the target speech and interfering signals for enhancing the target speech. We first estimate the local structures of the speech and noise signals at each time-frequency bin to form a local dictionary, and then recover the clean speech via sparse coding. The proposed algorithm is simple to implement and requires no prior knowledge of speech and noise. Our experimental evaluations demonstrate that the proposed method can suppress interferer and meantime preserve target speech more than some conventional methods.

Full Paper

Bibliographic reference.  Zhou, Zhiyuan / Ding, Zhaogui / Li, Weifeng / Wu, Zhiyong / Wang, Longbiao / Liao, Qingmin (2014): "Multi-channel speech enhancement using sparse coding on local time-frequency structures", In INTERSPEECH-2014, 2824-2827.