In this paper, the tasks of speech source localization, source counting and source separation are addressed for an unknown number of sources in a stereo recording scenario. In the first stage, the angles of arrival of individual source signals are estimated through a peak finding scheme applied to the angular spectrum which has been derived using non-linear GCC-PHAT. Then, based on the known channel mixture coefficients, we propose an approach for separating the sources based on Maximum Likelihood (ML) estimation. The predominant source in each time-frequency bin is identified through ML assuming a diffuse noise model. The separation performance is improved over a binary time-frequency masking method. The performance is measured by obtaining the existing metrics for blind source separation evaluation. The experiments are performed on synthetic speech mixtures in both anechoic and reverberant environments.
Bibliographic reference. Mirzaei, Sayeh / Van hamme, Hugo / Norouzi, Yaser (2014): "Blind speech source localization, counting and separation for 2-channel convolutive mixtures in a reverberant environment", In INTERSPEECH-2014, 860-864.