ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances

Jinghua Zhong, Wenping Hu, Frank K. Soong, Helen Meng

We investigate how to improve the performance of DNN i-vector based speaker verification for short, text-constrained test utterances, e.g. connected digit strings. A text-constrained verification, due to its smaller, limited vocabulary, can deliver better performance than a text-independent one for a short utterance. We study the problem with “phonetically aware” Deep Neural Net (DNN) in its capability on “stochastic phonetic-alignment” in constructing supervectors and estimating the corresponding i-vectors with two speech databases: a large vocabulary, conversational, speaker independent database (Fisher) and a small vocabulary, continuous digit database (RSR2015 Part III). The phonetic alignment efficiency and resultant speaker verification performance are compared with differently sized senone sets which can characterize the phonetic pronunciations of utterances in the two databases. Performance on RSR2015 Part III evaluation shows a relative improvement of EER, i.e., 7.89% for male speakers and 3.54% for female speakers with only digit related senones. The DNN bottleneck features were also studied to investigate their capability of extracting phonetic sensitive information which is useful for text-independent or text-constrained speaker verifications. We found that by tandeming MFCC with bottleneck features, EERs can be further reduced.


doi: 10.21437/Interspeech.2017-1036

Cite as: Zhong, J., Hu, W., Soong, F.K., Meng, H. (2017) DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances. Proc. Interspeech 2017, 1507-1511, doi: 10.21437/Interspeech.2017-1036

@inproceedings{zhong17_interspeech,
  author={Jinghua Zhong and Wenping Hu and Frank K. Soong and Helen Meng},
  title={{DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1507--1511},
  doi={10.21437/Interspeech.2017-1036}
}