ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Using group delay functions from all-pole models for speaker recognition

Padmanabhan Rajan, Tomi Kinnunen, Cemal Hanilçi, Jouni Pohjalainen, Paavo Alku

Popular features for speech processing, such as mel-frequency cepstral coefficients (MFCCs), are derived from the short-term magnitude spectrum, whereas the phase spectrum remains unused. While the common argument to use only the magnitude spectrum is that the human ear is phase-deaf, phase-based features have remained less explored due to additional signal processing difficulties they introduce. A useful representation of the phase is the group delay function, but its robust computation remains difficult. This paper advocates the use of group delay functions derived from parametric all-pole models instead of their direct computation from the discrete Fourier transform. Using a subset of the vocal effort data in the NIST 2010 speaker recognition evaluation (SRE) corpus, we show that group delay features derived via parametric all-pole models improve recognition accuracy, especially under high vocal effort. Additionally, the group delay features provide comparable or improved accuracy over conventional magnitude-based MFCC features. Thus, the use of group delay functions derived from all-pole models provide an effective way to utilize information from the phase spectrum of speech signals.


doi: 10.21437/Interspeech.2013-416

Cite as: Rajan, P., Kinnunen, T., Hanilçi, C., Pohjalainen, J., Alku, P. (2013) Using group delay functions from all-pole models for speaker recognition. Proc. Interspeech 2013, 2489-2493, doi: 10.21437/Interspeech.2013-416

@inproceedings{rajan13_interspeech,
  author={Padmanabhan Rajan and Tomi Kinnunen and Cemal Hanilçi and Jouni Pohjalainen and Paavo Alku},
  title={{Using group delay functions from all-pole models for speaker recognition}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2489--2493},
  doi={10.21437/Interspeech.2013-416}
}