ISCA Archive Odyssey 2012
ISCA Archive Odyssey 2012

Bottleneck features for speaker recognition

Sibel Yaman, Jason Pelecanos, Ruhi Sarikaya

Bottleneck neural networks have recently found success in a variety of speech recognition tasks. This paper presents an approach in which they are utilized in the front-end of a speaker recognition system. The network inputs are mel-frequency cepstral coefficients (MFCCs) from multiple consecutive frames and the outputs are speaker labels. We propose using a recording-level criterion that is optimized via an online learning algorithm. We furthermore propose retraining a network to focus on its errors when leveraging scores from an independently trained system. We ran experiments on the same- and different-microphone tasks of the 2010 NIST Speaker Recognition Evaluation. We found that the proposed bottleneck feature extraction paradigm performs slightly worse than MFCCs but provides complementary information in combination. We also found that the proposed combination strategy with re-training improved the EER by 14% and 18% relative over the baseline MFCC system in the same- and different-microphone tasks respectively.

Cite as: Yaman, S., Pelecanos, J., Sarikaya, R. (2012) Bottleneck features for speaker recognition. Proc. The Speaker and Language Recognition Workshop (Odyssey 2012), 105-108

  author={Sibel Yaman and Jason Pelecanos and Ruhi Sarikaya},
  title={{Bottleneck features for speaker recognition}},
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2012)},