ISCA Archive Odyssey 2020
ISCA Archive Odyssey 2020

Speech Bandwidth Expansion For Speaker Recognition On Telephony Audio

Ganesh Sivaraman, Amruta Vidwans, Elie Khoury

Practical applications often require speaker recognition systems to work well for audio files of different sampling rates. However, the performance of speaker recognition systems degrades substantially under a mismatched audio sampling rate between the training and testing conditions. For example, wideband speaker recognition models trained on audio files with a 16kHz sampling rate perform poorly on telephony audio with an 8kHz sampling rate due to the missing higher frequency information. In this paper, we propose a Deep Neural Network (DNN) based system to estimate the speech spectrum in the frequencies above 4kHz for narrowband 8kHz telephony audio. We train the proposed system on speech datasets processed using various simulated telephony codecs. Additionally, we perform speaker recognition and verification experiments by using the bandwidth expansion system as a preprocessor for speaker verification using wideband models. The dataset used for speaker verification experiments are downsampled Voxceleb1, downsampled SITW data, and the NIST SRE 2010 protocols. We see a significant improvement in the results compared to a simple upsampling with interpolation and low-pass filtering. These promising experiments show that the proposed bandwidth expansion system can be used successfully as a data augmentation for the training of speaker embeddings.

doi: 10.21437/Odyssey.2020-63

Cite as: Sivaraman, G., Vidwans, A., Khoury, E. (2020) Speech Bandwidth Expansion For Speaker Recognition On Telephony Audio. Proc. The Speaker and Language Recognition Workshop (Odyssey 2020), 440-445, doi: 10.21437/Odyssey.2020-63

  author={Ganesh Sivaraman and Amruta Vidwans and Elie Khoury},
  title={{Speech Bandwidth Expansion For Speaker Recognition On Telephony Audio}},
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2020)},