ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Voice activity classification for automatic bi-speaker adaptive beamforming in speech separation

Thuy N. Tran, William Cowley, André Pollok

A simple and low computational complexity system for bi-speaker speech separation is proposed in this paper. The system is constructed of a voice activity classification (VAC) module and an adaptive bi-beamformer module for speech separation using a microphone array. The first module identifies active speaker(s) and allows the system to control the adaptation of the second module automatically. The VAC is based on a novel classification method containing two steps. The first step uses a robust VAC method based on our previous work on beamformer-output-ratio of a bi-beamforming system. The second step refines the VAC results using a novel method derived from an analytical result on the output power of an adaptive beamformer. The system is tested in reverberant environments with both synthesized and real recordings. The synthesized recordings contain two speakers, a background speech and noises. The real recording contains two speakers speaking spontaneously. The VAC results satisfy a conservative classification scheme to avoid the signal cancellation problem. The final separation outputs are compared with the ideal outputs provided by genie-aided adaptive beamformers which have perfect VAC knowledge. The results show that the propose automatic system achieves high performance close to the ideal system.


doi: 10.21437/Interspeech.2013-234

Cite as: Tran, T.N., Cowley, W., Pollok, A. (2013) Voice activity classification for automatic bi-speaker adaptive beamforming in speech separation. Proc. Interspeech 2013, 817-821, doi: 10.21437/Interspeech.2013-234

@inproceedings{tran13b_interspeech,
  author={Thuy N. Tran and William Cowley and André Pollok},
  title={{Voice activity classification for automatic bi-speaker adaptive beamforming in speech separation}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={817--821},
  doi={10.21437/Interspeech.2013-234}
}