EUROSPEECH 2003 - INTERSPEECH 2003
In speaker verification over public telephone networks, utterances can be obtained from different types of handsets. Different handsets may introduce different degrees of distortion to the speech signals. This paper attempts to combine a handset selector with (1) handset-specific transformations and (2) handset-dependent speaker models to reduce the effect caused by the acoustic distortion. Specifically, a number of Gaussian mixture models are independently trained to identify the most likely handset given a test utterance; then during recognition, the speaker model and background model are either transformed by MLLR-based handset-specific transformation or respectively replaced by a handset-dependent speaker model and a handset-dependent background model whose parameters were adapted by reinforced learning to fit the new environment. Experimental results based on 150 speakers of the HTIMIT corpus show that environment adaptation based on both MLLR and reinforced learning outperforms the classical CMS, Hnorm and Tnorm approaches, with MLLR adaptation achieves the best performance.
Bibliographic reference. Yiu, Kwok-Kwong / Mak, Man-Wai / Kung, Sun-Yuan (2003): "Environment adaptation for robust speaker verification", In EUROSPEECH-2003, 2973-2976.