Joint Optimization of Neural Acoustic Beamforming and Dereverberation with x-Vectors for Robust Speaker Verification

Joon-Young Yang, Joon-Hyuk Chang


In this paper, we investigate the deep neural network (DNN) supported acoustic beamforming and dereverberation as the front-end of the x-vector speaker verification (SV) framework in a noisy and reverberant environment. Firstly, a DNN for supporting either the classical beamforming (e. g. MVDR) or the dereverberation (e. g. WPE) algorithm is trained on multi-channel speech signals. Next, an x-vector speaker embedding network is trained on top of the enhanced speech features to classify the training speakers. Finally, after the separate training stages are over, either one or both of the DNN supported beamforming and dereverberation modules are serially connected to the x-vector network, and jointly trained to optimize the common objective of speaker classification. Experiments on the artificially generated speech dataset using simulated and real room impulse responses (RIRs) with various types of domestic noise samples show that jointly training the supportive neural network models along with the x-vector network within the classical speech enhancement framework brings significant performance gain for robust text-independent (TI) SV.


 DOI: 10.21437/Interspeech.2019-1356

Cite as: Yang, J., Chang, J. (2019) Joint Optimization of Neural Acoustic Beamforming and Dereverberation with x-Vectors for Robust Speaker Verification. Proc. Interspeech 2019, 4075-4079, DOI: 10.21437/Interspeech.2019-1356.


@inproceedings{Yang2019,
  author={Joon-Young Yang and Joon-Hyuk Chang},
  title={{Joint Optimization of Neural Acoustic Beamforming and Dereverberation with x-Vectors for Robust Speaker Verification}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4075--4079},
  doi={10.21437/Interspeech.2019-1356},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1356}
}