8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Improved Location Features for Meeting Speaker Diarization

Scott Otterson

University of Washington, USA

This paper proposes several improvements to the correlation-based location features recently used in meeting speaker diarization. A speech-specific alternative to the generalized cross correlation phase transform (GCC-PHAT) algorithm is tested and shown to provide equal or better results without noise reduction or continuity-enforcing smoothing. The limitations of a single correlation reference waveform are discussed, and it is shown how a multi-band energy ratio feature can help overcome them, yielding significantly improved performance. An all-pairs correlation is also proposed, and when combined with energy ratios, it also improves upon the baseline system. However, the best combination is the baseline correlation features with energy ratios.

Full Paper

Bibliographic reference.  Otterson, Scott (2007): "Improved location features for meeting speaker diarization", In INTERSPEECH-2007, 1849-1852.