This study describes an experiment designed to establish the maximum number of competing speakers that can be detected accurately by a human listener and compares the results with the ones produced by using a distance based estimator working in frequency domain. We mixed a set of high quality audio samples with continuous speech, produced by publicly known people (actors, journalists and politicians) and also unknown persons and then we played the tracks to each listener within a target group. The volunteers were asked how many cumulated speakers they counted and how they obtained the response. We observed that while human subjects showed a correct detection ratio of 31%, we were able to establish a set of equally spaced thresholds for the estimator in order to achieve 66% accuracy. The paper also summarizes the methods that were reported by the listeners to help in the detection.
Bibliographic reference. Andrei, Valentin / Cucu, Horia / Buzo, Andi / Burileanu, Corneliu (2014): "Detecting the number of competing speakers — human selective hearing versus spectrogram distance based estimator", In INTERSPEECH-2014, 467-470.