2001: A Speaker Odyssey - The Speaker Recognition Workshop

June 18-22, 2001
Crete, Greece

Combining pitch and MFCC for speaker identification systems

Hassan Ezzaidi (1), Jean Rouat (2), Douglas O'Shaughnessy (2)

(1) ERMETIS, Université du Québec à Chicoutimi, Canada
(2) INRS-Télécommunications, Université du Québec, Montréal, Canada

Usually, speaker recognition systems do not take into account the short-term dependence between the vocal source and the vocal tract. A feasibility study that retains this dependence is presented here. A model of joint probability functions of the pitch and the feature vectors is proposed. Three strategies are designed and compared for all female speakers taken from the SPIDRE corpus. The first operates on all voiced and unvoiced speech segments (baseline strategy). The second strategy considers only the voiced speech segments and the last includes the short-term pitch information along with the standard MFCC. We use two pattern recognizers: LVQ-SLP and GMM. In all cases, we observe an increase in the identification rates and more specifically when using a time duration of 500 ms (6% higher).

