ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Modulation features for noise robust speaker identification

Vikramjit Mitra, Mitchell McLaren, Horacio Franco, Martin Graciarena, Nicolas Scheffer

Current state-of-the-art speaker identification (SID) systems perform exceptionally well under clean conditions, but their performance deteriorates when noise and channel degradations are introduced. Literature has mostly focused on robust modeling techniques to combat degradations due to background noise and/or channel effects, and have demonstrated significant improvement in SID performance in noise. In this paper, we present a robust acoustic feature on top of robust modeling techniques to further improve speaker-identification performance. We propose Modulation features of Medium Duration sub-band Speech Amplitudes (MMeDuSA); an acoustic feature motivated by human auditory processing, which is robust to noise corruption and captures speaker stylistic differences. We analyze the performance of MMeDuSA using SRI International's robust SID system using a channel and noise degraded multilingual corpus distributed through the Defense Advance Research Projects Agency (DARPA) Robust Automatic Transcription of Speech (RATS) program. When benchmarked against standard cepstral features (MFCC) and other noise robust acoustic features, MMeDuSA provided lower SID error rates compared to the others.


doi: 10.21437/Interspeech.2013-695

Cite as: Mitra, V., McLaren, M., Franco, H., Graciarena, M., Scheffer, N. (2013) Modulation features for noise robust speaker identification. Proc. Interspeech 2013, 3703-3707, doi: 10.21437/Interspeech.2013-695

@inproceedings{mitra13b_interspeech,
  author={Vikramjit Mitra and Mitchell McLaren and Horacio Franco and Martin Graciarena and Nicolas Scheffer},
  title={{Modulation features for noise robust speaker identification}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3703--3707},
  doi={10.21437/Interspeech.2013-695}
}