7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Comparative Evaluation of CASA and BSS Models for Subband Cocktail-Party Speech Separation

Frédéric Berthommier (1), Seungjin Choi (2)

(1) Institut de la Communication Parlée/INPG, France; (2) Pohang University of Science and Technology, Korea

For speech segregation, a recurrent blind separation model (BSS) is tested together with a Computational Auditory Scene Analysis (CASA) model, which is based on the localisation cue and the evaluation of the Time Delay Of Arrival (TDOA). The test database is composed of 332 binary mixture sentences recorded in stereo with a static set-up. These are truncated at 1 second for the simulations. For applying the two models, we divide the frequency domain into a variable number of subbands, which are processed independently. Then, we evaluate the gain, using reference signals recorded in isolation. After a careful analysis, we find similar gains of about 2-3dB for both methods. The variation of the number of subbands allows an optimisation, and we obtain a significant peak at 4 subbands for the CASA model, as well as a maximum at 2 subbands for the BSS model.

