4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Robust Prosodic Features for Speaker Identification

M. J. Carey, E. S. Parris, H. Lloyd-Thomas, S. J. Bennett

Ensigma Ltd, Turing House, Station Road, Chepstow, Gwent, UK

This paper describes the use of prosodic features for speaker identification. Features based on the pitch and energy contours of speech are described and the relative importance of each feature for speaker identification is investigated. The mean and variance of the pitch period in voiced sections of speech are shown to be particularly useful at discriminating between speakers. Fusing these features with a Hidden Markov Model speaker identification system gave a marked improvement in figure of merit, over 30% gain was achieved on the six NIST 1995 Evaluation tests presented. Handset variability is known to have an adverse effect on performance when traditional spectral features are used e.g. cepstra. Results are presented showing that the prosodic features are more robust to handset variability.

Full Paper

Bibliographic reference.  Carey, M. J. / Parris, E. S. / Lloyd-Thomas, H. / Bennett, S. J. (1996): "Robust prosodic features for speaker identification", In ICSLP-1996, 1800-1803.