12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

LP Residual Features for Robust, Privacy-Sensitive Speaker Diarization

Sree Hari Krishnan Parthasarathi, Hervé Bourlard, Daniel Gatica-Perez

Idiap Research Institute, Switzerland

We present a comprehensive study of linear prediction residual for speaker diarization on single and multiple distant microphone conditions in privacy-sensitive settings, a requirement to analyze a wide range of spontaneous conversations. Two representations of the residual are compared, namely real-cepstrum and MFCC, with the latter performing better. Experiments on RT06eval show that residual with subband information from 2.5 kHz to 3.5 kHz and spectral slope yields a performance close to traditional MFCC features. As a way to objectively evaluate privacy in terms of linguistic information, we perform phoneme recognition. Residual features yield low phoneme accuracies compared to traditional MFCC features.

Full Paper

Bibliographic reference.  Parthasarathi, Sree Hari Krishnan / Bourlard, Hervé / Gatica-Perez, Daniel (2011): "LP residual features for robust, privacy-sensitive speaker diarization", In INTERSPEECH-2011, 1045-1048.