Statistical Modeling of Speaker’s Voice with Temporal Co-Location for Active Voice Authentication

Zhong Meng, Biing-Hwang Juang


Active voice authentication (AVA) is a new mode of talker authentication, in which the authentication is performed continuously on very short segments of the voice signal, which may have instantaneously undergone change of talker. AVA is necessary in providing real-time monitoring of a device authorized for a particular user. The authentication test thus cannot rely on a long history of the voice data nor any past decisions. Most conventional voice authentication techniques that operate on the assumption that the entire test utterance is from only one talker with a claimed identity (including i-vector) fail to meet this stringent requirement. This paper presents a different signal modeling technique, within a conditional vector-quantization framework and with matching short-time statistics that take into account the co-located speech codes to meet the new challenge. As one variation, the temporally co-located VQ (TC-VQ) associates each codeword with a set of Gaussian mixture models to account for the co-located distributions and a temporally co-located hidden Markov model (TC-HMM) is built upon the TC-VQ. The proposed technique achieves an window-based equal error rate in the range of 3–5% and a relative gain of 4–25% over a baseline system using traditional HMMs on the AVA database.


DOI: 10.21437/Interspeech.2016-650

Cite as

Meng, Z., Juang, B. (2016) Statistical Modeling of Speaker’s Voice with Temporal Co-Location for Active Voice Authentication. Proc. Interspeech 2016, 1725-1729.

Bibtex
@inproceedings{Meng+2016,
author={Zhong Meng and Biing-Hwang Juang},
title={Statistical Modeling of Speaker’s Voice with Temporal Co-Location for Active Voice Authentication},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-650},
url={http://dx.doi.org/10.21437/Interspeech.2016-650},
pages={1725--1729}
}