Most conventional speaker recognition systems rely on short-term spectral information. But they ignore the long-term information such as prosody which also conveys speaker information. In this paper, we propose an approach that extracts prosodic features based on long-term information. First, by making wavelet analysis, we can reveal the trends of the f0 and energy contour. Subsequently, the prosodic features are extracted only from approximation coefficients. We use these features in a GMM-UBM based text-independent speaker verification system. The proposed method achieves an EER of 23.3% on the NIST2004 8sides-1side task scheme. This result is promising while the baseline system, which uses short-term f0 feature, only results in an EER of 33.49% in this task.
Cite as: Chen, J., Dai, B., Sun, J. (2005) Prosodic features based on wavelet analysis for speaker verification. Proc. Interspeech 2005, 3093-3096, doi: 10.21437/Interspeech.2005-664
@inproceedings{chen05e_interspeech, author={Jixu Chen and Beiqian Dai and Jun Sun}, title={{Prosodic features based on wavelet analysis for speaker verification}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={3093--3096}, doi={10.21437/Interspeech.2005-664} }