Joint Speaker and Lexical Modeling for Short-Term Characterization of Speaker

Guangsen Wang, Kong Aik Lee, Trung Hieu Nguyen, Hanwu Sun, Bin Ma


For speech utterances of very short duration, speaker characterization has shown strong dependency on the lexical content. In this context, speaker verification is always performed by analyzing and matching speaker pronunciation of individual words, syllables, or phones. In this paper, we advocate the use of hidden Markov model (HMM) for joint modeling of speaker characteristic and lexical content. We then develop a scoring model that scores only the speaker part rather than the joint speaker-lexical component leading to a better speaker verification performance. Experiments were conducted on the text-prompted task of RSR2015 and the RedDots datasets. In the RSR2015, the prompted texts are limited to random sequences of digits. The RedDots dataset dictates an unconstrained scenario where the prompted texts are free-text sentences. Both RSR2015 and RedDots datasets are publicly available.


DOI: 10.21437/Interspeech.2016-929

Cite as

Wang, G., Lee, K.A., Nguyen, T.H., Sun, H., Ma, B. (2016) Joint Speaker and Lexical Modeling for Short-Term Characterization of Speaker. Proc. Interspeech 2016, 415-419.

Bibtex
@inproceedings{Wang+2016,
author={Guangsen Wang and Kong Aik Lee and Trung Hieu Nguyen and Hanwu Sun and Bin Ma},
title={Joint Speaker and Lexical Modeling for Short-Term Characterization of Speaker},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-929},
url={http://dx.doi.org/10.21437/Interspeech.2016-929},
pages={415--419}
}