12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Monaural Speech Separation Based on a 2D Processing and Harmonic Analysis

Azam Rabiee (1), Saeed Setayeshi (2), Soo-Young Lee (3)

(1) Islamic Azad University, Iran
(2) Amirkabir University of Technology, Iran
(3) KAIST, Korea

This paper proposes a new Computational Auditory Scene Analysis (CASA) approach based on a 2D spectro-temporal analysis and harmonic separation. The 2D processing, so-called Grating Compression Transform (GCT), analyzes the spectro-temporal content of the spectrogram, mimicking the processing of the primary auditory cortex. The estimated pitches from the GCT analysis are used for separation using harmonic magnitude suppression (HMS). A powerful aspect of our model is requiring no prior training on a specific training corpus. A baseline system based on the harmonic separation is designed for comparison. Since the baseline system is similar to the proposed except the auditory-cortex-like analysis, the SIR results illustrate its importance in this task.

Full Paper

Bibliographic reference.  Rabiee, Azam / Setayeshi, Saeed / Lee, Soo-Young (2011): "Monaural speech separation based on a 2d processing and harmonic analysis", In INTERSPEECH-2011, 1749-1752.