ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Using phone log-likelihood ratios as features for speaker recognition

Mireia Diez, Amparo Varona, Mikel Penagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel

The so called Phone Log-Likelihood Ratio (PLLR) features, computed on phone posterior probabilities provided by phonetic decoders, convey acoustic-phonetic information in a sequence of frame-level vectors. Thus, PLLRs can be easily plugged into traditional acoustic systems just by replacing MFCCs, PLPs or whatever other representation. PLLR features were used under an iVector-PLDA approach in our submission to the NIST 2012 Speaker Recognition Evaluation (SRE). In this work, we present a report of the goodness of these features for speaker recognition. Results on the telephone clean speech condition of the NIST 2010 and 2012 SRE show that, although the system based on PLLR features does not reach state-of-the-art performance, including it in a fusion with a traditional acoustic based system (trained on MFCC features) provides remarkable gains in performance (among the best reported in the NIST 2012 SRE telephone without added noise condition), revealing a fruitful way of using acoustic-phonetic information for speaker recognition.


doi: 10.21437/Interspeech.2013-419

Cite as: Diez, M., Varona, A., Penagarikano, M., Rodríguez-Fuentes, L.J., Bordel, G. (2013) Using phone log-likelihood ratios as features for speaker recognition. Proc. Interspeech 2013, 2504-2508, doi: 10.21437/Interspeech.2013-419

@inproceedings{diez13b_interspeech,
  author={Mireia Diez and Amparo Varona and Mikel Penagarikano and Luis Javier Rodríguez-Fuentes and Germán Bordel},
  title={{Using phone log-likelihood ratios as features for speaker recognition}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2504--2508},
  doi={10.21437/Interspeech.2013-419}
}