ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Counting competing speakers in a timeframe — human versus computer

Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu

We propose an automated solution for computing the number of simultaneous active speakers within a timeframe. The method is studied in parallel with a perception experiment realized with the help of 28 volunteers that were asked to detect how many speakers talk simultaneously in several recordings with variable length. For this study we focus on how listening time and the usage of familiar voices in the recordings impact the correct detection ratio. Regarding the automated method we discuss the influence of noise and the evolution of detection error determined by the speech duration. We observe that when capturing clean speech sources, the method is 76% accurate even for 10 simultaneous speakers, considering speech lengths longer than 3.5 seconds. The volunteers did not systematically detect correctly more than 4 competing speakers even when listening up to 80 seconds.

doi: 10.21437/Interspeech.2015-673

Cite as: Andrei, V., Cucu, H., Buzo, A., Burileanu, C. (2015) Counting competing speakers in a timeframe — human versus computer. Proc. Interspeech 2015, 3399-3403, doi: 10.21437/Interspeech.2015-673

  author={Valentin Andrei and Horia Cucu and Andi Buzo and Corneliu Burileanu},
  title={{Counting competing speakers in a timeframe — human versus computer}},
  booktitle={Proc. Interspeech 2015},