This paper proposes several improvements to the correlation-based location features recently used in meeting speaker diarization. A speech-specific alternative to the generalized cross correlation phase transform (GCC-PHAT) algorithm is tested and shown to provide equal or better results without noise reduction or continuity-enforcing smoothing. The limitations of a single correlation reference waveform are discussed, and it is shown how a multi-band energy ratio feature can help overcome them, yielding significantly improved performance. An all-pairs correlation is also proposed, and when combined with energy ratios, it also improves upon the baseline system. However, the best combination is the baseline correlation features with energy ratios.
Bibliographic reference. Otterson, Scott (2007): "Improved location features for meeting speaker diarization", In INTERSPEECH-2007, 1849-1852.