Speaker Activity Detection and Minimum Variance Beamforming for Source Separation

Enea Ceolini, Jithendar Anumula, Adrian Huber, Ilya Kiselev, Shih-Chii Liu


This work proposes a framework that renders minimum variance beamforming blind allowing for source separation in real world environments with an ad-hoc multi-microphone setup using no assumptions other than knowing the number of speakers. The framework allows for multiple active speakers at the same time and estimates the activity of every single speaker at flexible time resolution. These estimated speaker activities are subsequently used for the calibration of the beamforming algorithm. This framework is tested with three different speaker activity detection (SAD) methods, two of which use classical algorithms and one that is event-driven. Our methods, when tested in real world reverberant scenarios, can achieve very high signal-to-interference ratio (SIR) of around 20 dB and sound quality of 0.85 in short-time objective intelligibility (STOI) close to optimal beamforming results of 22 dB SIR and 0.89 in STOI.


 DOI: 10.21437/Interspeech.2018-1606

Cite as: Ceolini, E., Anumula, J., Huber, A., Kiselev, I., Liu, S. (2018) Speaker Activity Detection and Minimum Variance Beamforming for Source Separation. Proc. Interspeech 2018, 836-840, DOI: 10.21437/Interspeech.2018-1606.


@inproceedings{Ceolini2018,
  author={Enea Ceolini and Jithendar Anumula and Adrian Huber and Ilya Kiselev and Shih-Chii Liu},
  title={Speaker Activity Detection and Minimum Variance Beamforming for Source Separation},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={836--840},
  doi={10.21437/Interspeech.2018-1606},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1606}
}