A novel statistical modeling and compensation method for robust speaker recognition is presented. The method specifically addresses the degradation in speaker verification performance due to the mismatch in channels (e.g., telephone handsets) between enrollment and testing sessions. In mismatched conditions, the new approach uses speaker-independent channel transformations to synthesize a speaker model that corresponds to the channel of the testing session. Effectively verification is always performed in matched channel conditions. Results on the 1998 NIST Speaker Recognition Evaluation corpus show that the new approach yields performance that matches the best reported results. Specifically, our approach yields similar improvements (19.9% reduction in EER compared to CMN alone) as the HNORM score-based compensation method, but with a fraction of the training time.
Cite as: Teunen, R., Shahshahani, B., Heck, L. (2000) A model-based transformational approach to robust speaker recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 495-498, doi: 10.21437/ICSLP.2000-315
@inproceedings{teunen00_icslp, author={Remco Teunen and Ben Shahshahani and Larry Heck}, title={{A model-based transformational approach to robust speaker recognition}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 2, 495-498}, doi={10.21437/ICSLP.2000-315} }