11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Exploitation of Phase Information for Speaker Recognition

Ning Wang, P. C. Ching, Tan Lee

Chinese University of Hong Kong, China

Auditory experiments show insensitivity of human ears to phase information in perceiving phonetic content of speech signal. However, the discarded phase information may provide useful acoustic cue for identifying individual speaker, this is especially useful for speaker recognition systems operated with degraded magnitude in adverse conditions. This paper is therefore motivated to derive phase-related features for reliable speaker recognition performance. A pertinent representation for most dominant primary frequencies present in the speech signal is first built. It is then applied to frames of the speech signal to derive effective speaker-discriminative features. Through a set of specifically designed experiments on synthetic vowels, it is observed that the proposed features are capable of differentiating the inclusive formants, pitch harmonics from other components, and expressing the vocal particularities in various-shaped formants. By combining with standard cepstral parameters, these phase-related features have shown to evidently reduce the identification error rate and equal error rate in the context of Gaussian mixture model-based speaker recognition system.

Full Paper

Bibliographic reference.  Wang, Ning / Ching, P. C. / Lee, Tan (2010): "Exploitation of phase information for speaker recognition", In INTERSPEECH-2010, 2126-2129.