Multiple speaker localization algorithms generally require a binary detector, which performs the source/noise classification of the location estimates. This is mainly due to the unknown time-varying number of sources, and to the presence of noise and reverberation. In this paper, we propose an unsupervised learning approach based on a naive Bayesian classifier. The proposed approach couples two speaker location features, namely, 1) the steered response power introduced at the location estimate, and 2) the corresponding maximum likelihood error, which characterizes the variance of the estimate. The latter is experimentally shown to be highly correlated with the steered power at the location estimate. The proposed method is further extended to control the misclassification rate through the use of a loss function. This approach is general, and can be easily extended to integrate more speaker/speech features. Experiments on the AV16.3 corpus show the effectiveness of the proposed approach.
Bibliographic reference. Oualil, Youssef / Faubel, Friedrich / Klakow, Dietrich (2013): "An unsupervised Bayesian classifier for multiple speaker detection and localization", In INTERSPEECH-2013, 2943-2947.