Accurate, microphone-based speaker localization in real-world environments, like office spaces or meeting rooms, must be able to track a single speaker and multiple concurrent speakers in the presence of reverberations and background noise. Our Multiband Joint Position-Pitch (M-PoPi) algorithm for circular microphone arrays already shows a frame-wise localization estimation score of about 95% for tracking a single speaker in a noisy, reverberant setting. In this paper, we present two extensions of the M-PoPi algorithm to improve the localization estimation accuracy also for multiple concurrent speakers. These extensions are a weighted spectro-temporal fragment analysis as a pre-processing step for the M-PoPi algorithm and a particle filter-based tracking as a post-processing step. Experiments using real-world recordings of two concurrent speakers in a typically reverberant meeting room show an improvement of the frame-wise localization estimation score from 43% using the plain M-PoPi algorithm to 66% using the M-PoPi algorithm with both extensions.
Bibliographic reference. Habib, Tania / Romsdorfer, Harald (2010): "Concurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing", In INTERSPEECH-2010, 2774-2777.