Odyssey 2012 - The Speaker and Language Recognition Workshop

Singapore
June 25-28, 2012

Effects of Audio and ASR Quality on Cepstral and High-level Speaker Verification Systems

Andreas Stolcke (1), Martin Graciarena (2), Luciana Ferrer (2)

(1) Conversational Systems Lab, Microsoft, Mountain View, CA, USA
(2) Speech Technology and Research Laboratory, SRI International,Menlo Park, CA, USA

Speech data for NIST speaker recognition evaluations has traditionally been distributed in compressed, telephone quality form, even for microphone data that was originally recorded at higher quality. We evaluate the effect that improved audio quality has for speaker verification performance, using a recently released full-bandwidth version of microphone data from the SRE2010 evaluation. Remarkably, we find substantially improved results even though the underlying speaker recognition models remain based on a telephone-band feature front end. For a cepstral GMM system we show improvements purely from the elimination of lossy (μlaw) coding and more effective noise reduction filtering at the full bandwidth. We also find that higher-level speaker recognition systems can benefit from better ASR quality enabled by the improved audio quality. Specifically, we show that a speech recognizer trained on full-bandwidth, distant-microphone meeting speech data yields reduced speaker verification error for speaker models based on MLLR features and word-N-gram features.

Full Paper

Bibliographic reference.  Stolcke, Andreas / Graciarena, Martin / Ferrer, Luciana (2012): "Effects of audio and ASR quality on cepstral and high-level speaker verification systems", In Odyssey-2012, 298-303.