12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Constrained Cepstral Speaker Recognition Using Matched UBM and JFA Training

Michelle Hewlett Sanchez, Luciana Ferrer, Elizabeth Shriberg, Andreas Stolcke

SRI International, USA

We study constrained speaker recognition systems, or systems that model standard cepstral features that fall within particular types of speech regions. A question in modeling such systems is whether to constrain universal background model (UBM) training, joint factor analysis (JFA), or both. We explore this question, as well as how to optimize UBM model size, using a corpus of Arabic male speakers. Over a large set of phonetic and prosodic constraints, we find that the performance of a system using constrained JFA and UBM is on average 5.24% better than when using constraint-independent (all frames) JFA and UBM. We find further improvement from optimizing UBM size based on the percentage of frames covered by the constraint.

Full Paper

Bibliographic reference.  Sanchez, Michelle Hewlett / Ferrer, Luciana / Shriberg, Elizabeth / Stolcke, Andreas (2011): "Constrained cepstral speaker recognition using matched UBM and JFA training", In INTERSPEECH-2011, 141-144.