A Low-Power Text-Dependent Speaker Verification System with Narrow-Band Feature Pre-Selection and Weighted Dynamic Time Warping

Qing He, Gregory Wornell, Wei Ma


To fully enable voice interaction in wearable devices, a system requires low-power, customizable voice-authenticated wake-up. Existing speaker-verification (SV) methods have shortcomings relating to power consumption and noise susceptibility. To meet the application requirements, we propose a low-power, text-dependent SV system comprising a sparse spectral feature extraction front-end showing improved noise robustness and accuracy at low power, and a back-end running an improved dynamic time warping (DTW) algorithm that preserves signal envelope while reducing misalignments. Without background noise, the proposed system achieves an equal-error-rate (EER) of 1.1%, compared to 1.4% with a conventional Mel-frequency cepstral coefficients (MFCC)+DTW system and 2.6% with a Gaussian mixture universal background (GMM-UBM) based system. At 3dB signal-to-noise ratio (SNR), the proposed system achieves an EER of 5.7%, compared to 13% with a conventional MFCC+DTW system and 6.8% with a GMM-UBM based system. The proposed system enables simple, low-power implementation such that the power consumption of the end-to-end system, which includes a voice activity detector, feature extraction front-end, and back-end decision unit, is under 380 uW.


DOI: 10.21437/Odyssey.2016-1

Cite as

He, Q., Wornell, G., Ma, W. (2016) A Low-Power Text-Dependent Speaker Verification System with Narrow-Band Feature Pre-Selection and Weighted Dynamic Time Warping. Proc. Odyssey 2016, 1-8.

Bibtex
@inproceedings{He+2016,
author={Qing He and Gregory Wornell and Wei Ma},
title={A Low-Power Text-Dependent Speaker Verification System with Narrow-Band Feature Pre-Selection and Weighted Dynamic Time Warping},
year=2016,
booktitle={Odyssey 2016},
doi={10.21437/Odyssey.2016-1},
url={http://dx.doi.org/10.21437/Odyssey.2016-1},
pages={1--8}
}