September 22-25, 1997
The main goal of this work is to develop a competitive segment- based speaker verification system that is computationally efficient. To achieve our goal, we modified SUMMIT  to suit our needs. The speech signal was first transformed into a hierarchical segment network using frame-based measurements. Next, acoustic models for 168 speakers were developed for a set of 6 broad phoneme classes. The models represented feature statistics with diagonal Gaussians, preceded by principle component analysis. The feature vector included segment-averaged MFCCs, plus three prosodic measurements: energy, fundamental frequency (F0), and duration. The size and content of the feature vector were determined through a greedy algorithm while optimizing overall speaker verification performance. We were able to achieve a performance of 2.74% equal error rate (EER) using cohorts during testing; and 1.59% EER using all speakers during testing. We reduced computation significantly through the use of a small number of features, a small number of phonetic models per speaker, few model parameters, and few competing speakers during testing (when cohorts are used).
Bibliographic reference. Sarma, Sridevi V. / Zue, Victor W. (1997): "A segment-based speaker verification system using SUMMIT", In EUROSPEECH-1997, 843-846.