5th International Conference on Spoken Language Processing
This paper investigates a method of creating robust speaker models that are not sensitive to session-dependent (SD) utterance-variation and handset-dependent (HD) distortion for HMM-based speaker verification systems in a real telephone network. We recently reported a method of creating session-independent (SI) speaker-HMMs that are not sensitive to SD utterance-variation. In that method, the distortion function that transforms SI speaker-HMMs to SD speaker-HMMs is introduced, and the parameters in the function and the speaker-HMM parameters are jointly estimated using a speaker adaptive training algorithm. This paper proposes a method that is less sensitive to SD utterance-variation and HD distortion than the previous method. This new idea focuses on different difficulties in estimating parameters in distortion functions for SD utterance-variation and HD distortion. In text-independent verification experiments using telephone speech data, the error reduction rate of the improved method compared with that of the conventional method of cepstral mean normalization is 24%.
Bibliographic reference. Matsui, Tomoko / Aikawa, Kiyoaki (1998): "Robust speaker verification insensitive to session-dependent utterance variation and handset-dependent distortion", In ICSLP-1998, paper 0714.