INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Using Phone Log-Likelihood Ratios as Features for Speaker Recognition

Mireia Diez, Amparo Varona, Mikel Penagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel

Universidad del País Vasco, Spain

The so called Phone Log-Likelihood Ratio (PLLR) features, computed on phone posterior probabilities provided by phonetic decoders, convey acoustic-phonetic information in a sequence of frame-level vectors. Thus, PLLRs can be easily plugged into traditional acoustic systems just by replacing MFCCs, PLPs or whatever other representation. PLLR features were used under an iVector-PLDA approach in our submission to the NIST 2012 Speaker Recognition Evaluation (SRE). In this work, we present a report of the goodness of these features for speaker recognition. Results on the telephone clean speech condition of the NIST 2010 and 2012 SRE show that, although the system based on PLLR features does not reach state-of-the-art performance, including it in a fusion with a traditional acoustic based system (trained on MFCC features) provides remarkable gains in performance (among the best reported in the NIST 2012 SRE telephone without added noise condition), revealing a fruitful way of using acoustic-phonetic information for speaker recognition.

Full Paper

Bibliographic reference.  Diez, Mireia / Varona, Amparo / Penagarikano, Mikel / Rodríguez-Fuentes, Luis Javier / Bordel, Germán (2013): "Using phone log-likelihood ratios as features for speaker recognition", In INTERSPEECH-2013, 2504-2508.