Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech

Arindam Jati, Raghuveer Peri, Monisankha Pal, Tae Jin Park, Naveen Kumar, Ruchir Travadi, Panayiotis Georgiou, Shrikanth Narayanan


The paper aims to address the task of speaker verification with single-channel, noisy and far-field speech by learning an embedding or feature representation that is invariant to different acoustic environments. We approach from two different directions. First, we adopt a newly proposed discriminative model that hybridizes Deep Neural Network (DNN) and Total Variability Model (TVM) with the goal of integrating their strengths. DNN helps learning a unique variable length representation of the feature sequence while TVM accumulates them into a fixed dimensional vector. Second, we propose a multitask training scheme with cross entropy and triplet losses in order to obtain good classification performance as well as distinctive speaker embeddings. The multi-task training is applied on both the DNN-TVM model and state-of-the-art x-vector system. The results on the development and evaluation sets of the VOiCES challenge reveal that the proposed multi-task training helps improving models that are solely based on cross entropy, and it works better with DNN-TVM architecture than x-vector for the current task. Moreover, the multi-task models tend to show complementary relationship with cross entropy models, and thus improved performance is observed after fusion.


 DOI: 10.21437/Interspeech.2019-3010

Cite as: Jati, A., Peri, R., Pal, M., Park, T.J., Kumar, N., Travadi, R., Georgiou, P., Narayanan, S. (2019) Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech. Proc. Interspeech 2019, 2463-2467, DOI: 10.21437/Interspeech.2019-3010.


@inproceedings{Jati2019,
  author={Arindam Jati and Raghuveer Peri and Monisankha Pal and Tae Jin Park and Naveen Kumar and Ruchir Travadi and Panayiotis Georgiou and Shrikanth Narayanan},
  title={{Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2463--2467},
  doi={10.21437/Interspeech.2019-3010},
  url={http://dx.doi.org/10.21437/Interspeech.2019-3010}
}