8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Duration and Pronunciation Conditioned Lexical Modeling for Speaker Verification

Gokhan Tur, Elizabeth Shriberg, Andreas Stolcke, Sachin Kajarekar

SRI International, USA

We propose a method to improve speaker recognition lexical model performance using acoustic-prosodic information. More specifically, the lexical model is trained using duration- and pronunciation-conditioned word N-grams, simultaneously modeling lexical information along with their acoustic and prosodic characteristics. Support vector machines are used for modeling and scoring, with N-gram frequency vectors serving as features. Experimental results using NIST Speaker Recognition Evaluation data sets show that this method outperforms the regular word N-gram-based lexical models. Furthermore, our approach gives additional information when combined with a high-accuracy acoustic speaker model. We believe that this is a promising step toward integrated speaker recognition models that combine multiple types of high-level features.

Full Paper

Bibliographic reference.  Tur, Gokhan / Shriberg, Elizabeth / Stolcke, Andreas / Kajarekar, Sachin (2007): "Duration and pronunciation conditioned lexical modeling for speaker verification", In INTERSPEECH-2007, 2049-2052.