Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features

Jinxi Guo, Gary Yeung, Deepak Muralidharan, Harish Arsikere, Amber Afshan, Abeer Alwan


Speaker verification in real-world applications sometimes deals with limited duration of enrollment and/or test data. MFCC-based i-vector systems have defined the state-of-the-art for speaker verification, but it is well known that they are less effective with short utterances. To address this issue, we propose a method to leverage the speaker specificity and stationarity of subglottal acoustics. First, we present a deep neural network (DNN) based approach to estimate subglottal features from speech signals. The approach involves training a DNN-regression model that maps the log filter-bank coefficients of a given speech signal to those of its corresponding subglottal signal. Cross-validation experiments on the WashU-UCLA corpus (which contains parallel recordings of speech and subglottal acoustics) show the effectiveness of our DNN-based estimation algorithm. The average correlation coefficient between the actual and estimated subglottal filter-bank coefficients is 0.9. A score-level fusion of MFCC and subglottal-feature systems in the i-vector PLDA framework yields statistically-significant improvements over the MFCC-only baseline. On the NIST SRE 08 truncated 10sec–10sec and 5sec–5sec core evaluation tasks, the relative reduction in equal error rate ranges between 6 and 14% for the conditions tested with both microphone and telephone speech.


DOI: 10.21437/Interspeech.2016-282

Cite as

Guo, J., Yeung, G., Muralidharan, D., Arsikere, H., Afshan, A., Alwan, A. (2016) Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features. Proc. Interspeech 2016, 2219-2222.

Bibtex
@inproceedings{Guo+2016,
author={Jinxi Guo and Gary Yeung and Deepak Muralidharan and Harish Arsikere and Amber Afshan and Abeer Alwan},
title={Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-282},
url={http://dx.doi.org/10.21437/Interspeech.2016-282},
pages={2219--2222}
}