Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBM

Achintya Kr. Sarkar, Zheng-Hua Tan


In this paper, we investigate the Hidden Markov Model (HMM) and the temporal Gaussian Mixture Model (GMM) systems based on the Universal Background Model (UBM) concept to capture temporal information of speech for Text Dependent (TD) Speaker Verification (SV). In TD-SV, target speakers are constrained to use only predefined fixed sentence/s during both the enrollment and the test process. The temporal information is therefore important in the sense of utterance verification, i.e. whether the test utterance has the same sequence of textual content as the utterance used during the target enrollment. However, the temporal information is not considered in the classical GMM-UBM based TD-SV system. Moreover, no transcription knowledge of the speech is required in the HMM-UBM and temporal GMM-UBM based systems. We also study the fusion of the HMM-UBM, the temporal GMM-UBM and the classical GMM-UBM systems in SV. We show that the HMM-UBM system yields better performance than the other systems in most cases. Further, fusion of the systems improve the overall speaker verification performance. The results are shown in the different tasks of the RedDots challenge 2016 database.


DOI: 10.21437/Interspeech.2016-362

Cite as

Sarkar, A.K., Tan, Z. (2016) Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBM. Proc. Interspeech 2016, 425-429.

Bibtex
@inproceedings{Sarkar+2016,
author={Achintya Kr. Sarkar and Zheng-Hua Tan},
title={Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBM},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-362},
url={http://dx.doi.org/10.21437/Interspeech.2016-362},
pages={425--429}
}