Temporal dynamics is an important feature of speech that distinguishes speech from noise, as well as distinguishing between different speakers. In this paper, we present an approach to extract long-range temporal dynamics of speech for text-independent speaker recognition. We aim to maximize the noise immunity arising from the distinct temporal dynamics of speech. The new approach achieves this by identifying the longest matching segments between the training data and test data for recognition. Additionally, the new approach combines Bayesian adaptation, multicondition training and missing-feature theory to further advance the ability to model noisy speech. Experiments have been conducted on the NIST 2002 SRE database in the presence of various types of noise including fast-varying song and music. The new approach has shown improved performance over conventional noise-robust techniques.
Bibliographic reference. Jafari, Ayeh / Srinivasan, Ramji / Crookes, Danny / Ming, Ji (2011): "A longest matching segment approach with Bayesian adaptation - application to noise-robust speaker recognition", In INTERSPEECH-2011, 2749-2752.