Acoustic variability of speakers arises due to differences in their vocal tract characteristics. These individual speaker characteristics are reflected in a speech signal when speakers pronounce a given phoneme. The current work hypothesizes that clusters within a phoneme spoken by multiple speakers roughly correspond to different speakers. Based on this hypothesis, a Gaussian mixture model (GMM) based phoneme background model (PBM) is estimated. The components of such a PBM are used as a set of relevance variables in information bottleneck based speaker diarization system. Experiments are done using phone transcripts obtained from ground-truth and automatic speech recognition (ASR) system to estimate the PBM. The diarization experiments done on meeting recordings from AMI and NISTRT corpora show that the proposed method achieves significant improvements over the system using a background model which ignores phoneme information.
Bibliographic reference. Yella, Sree Harsha / Motlicek, Petr / Bourlard, Hervé (2014): "Phoneme background model for information bottleneck based speaker diarization", In INTERSPEECH-2014, 597-601.