8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Crosscorrelation-based Multispeaker Speech Activity Detection

Kornel Laskowski, Qin Jin, Tanja Schultz

Carnegie Mellon University, USA

We propose an algorithm for segmenting multispeaker meeting audio, recorded with personal channel microphones, into speech and non-speech intervals for each microphone's wearer. An algorithm of this type turns out to be necessary prior to subsequent audio processing because, in spite of close-talking microphones, the channels exhibit a high degree of crosstalk due to unbalanced calibration and small inter-speaker distance. The proposed algorithm is based on the shorttime crosscorrelation of all channel pairs. It requires no prior training and executes in one fifth real time on modern architectures. Using meeting audio collected at several sites, we present error rates for the segmentation task which do not appear correlated with microphone type or number of speakers. We also present the resulting improvement in speech recognition accuracy when segmentation is provided by this algorithm.

Full Paper

Bibliographic reference.  Laskowski, Kornel / Jin, Qin / Schultz, Tanja (2004): "Crosscorrelation-based multispeaker speech activity detection", In INTERSPEECH-2004, 973-976.