ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters

Tobias Gehrig, Ulrich Klee, John W. McDonough, Shajith Ikbal, Matthias Wölfel, Christian Fügen

In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. While this system functioned well, its utility was limited to scenarios in which a single speaker was to be tracked. In this work, we remove this restriction by generalizing the IEKF, first to a probabilistic data association filter, which incorporates a clutter model for rejection of spurious acoustic events, and then to a joint probabilistic data association filter (JPDAF), which maintains a separate state vector for each active speaker. In a set of experiments conducted on seminar and meeting data, the JPDAF speaker tracking system reduced the multiple object tracking error from 20.7% to 14.3% with respect to the IEKF system. In a set of automatic speech recognition experiments conducted on the output of a 64 channel microphone array which was beamformed using automatic speaker position estimates, applying the JPDAF tracking system reduced word error rate from 67.3% to 66.0%. Moreover, the word error rate on the beamformed output was 13.0% absolute lower than on a single channel of the array.


doi: 10.21437/Interspeech.2006-650

Cite as: Gehrig, T., Klee, U., McDonough, J.W., Ikbal, S., Wölfel, M., Fügen, C. (2006) Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters. Proc. Interspeech 2006, paper 2038-Thu2FoP.5, doi: 10.21437/Interspeech.2006-650

@inproceedings{gehrig06b_interspeech,
  author={Tobias Gehrig and Ulrich Klee and John W. McDonough and Shajith Ikbal and Matthias Wölfel and Christian Fügen},
  title={{Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 2038-Thu2FoP.5},
  doi={10.21437/Interspeech.2006-650}
}