We propose the use of modulation spectrogram features in speaker diarization. These features carry longer term characteristics of the acoustic signals than the widely used MFCCs, thus providing potential improvement by using both features in combination. Using the state-of-the-art ICSI speaker diarization system, an improvement of 20.77% relative DER is obtained on the NIST Rich Transcription 2007 task with respect to the MFCC only system.
Bibliographic reference. Vinyals, Oriol / Friedland, Gerald (2008): "Modulation spectrogram features for improved speaker diarization", In INTERSPEECH-2008, 630-633.