5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

A Segment-Based Speaker Verification System Using SUMMIT

Sridevi V. Sarma, Victor W. Zue

Spoken Language Systems Group, Laboratory for Computer Science, Massachusetts Institute of Technology Cambridge, MA, USA

The main goal of this work is to develop a competitive segment- based speaker verification system that is computationally efficient. To achieve our goal, we modified SUMMIT [12] to suit our needs. The speech signal was first transformed into a hierarchical segment network using frame-based measurements. Next, acoustic models for 168 speakers were developed for a set of 6 broad phoneme classes. The models represented feature statistics with diagonal Gaussians, preceded by principle component analysis. The feature vector included segment-averaged MFCCs, plus three prosodic measurements: energy, fundamental frequency (F0), and duration. The size and content of the feature vector were determined through a greedy algorithm while optimizing overall speaker verification performance. We were able to achieve a performance of 2.74% equal error rate (EER) using cohorts during testing; and 1.59% EER using all speakers during testing. We reduced computation significantly through the use of a small number of features, a small number of phonetic models per speaker, few model parameters, and few competing speakers during testing (when cohorts are used).

Full Paper

Bibliographic reference.  Sarma, Sridevi V. / Zue, Victor W. (1997): "A segment-based speaker verification system using SUMMIT", In EUROSPEECH-1997, 843-846.