Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition

Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma


Robustness of automatic speech recognition (ASR) systems is a critical issue due to noise and reverberations. Speech enhancement and model adaptation have been studied for long time to address this issue. Recently, the developments of multi-task joint-learning scheme that addresses noise reduction and ASR criteria in a unified modeling framework show promising improvements, but the model training highly relies on paired clean-noisy data. To overcome this limit, the generative adversarial networks (GANs) and the adversarial training method are deployed, which have greatly simplified the model training process without the requirements of complex front-end design and paired training data. Despite the fast developments of GANs for computer visions, only regular GANs have been adopted for robust ASR. In this work, we adopt a more advanced cycle-consistency GAN (CycleGAN) to address the training failure problem due to mode collapse of regular GANs. Using deep residual networks (ResNets), we further expand the multi-task scheme to a multi-task multi-network joint-learning scheme for more robust noise reduction and model adaptation. Experiment results on CHiME-4 show that our proposed approach significantly improves the noise robustness of the ASR system by achieving much lower word error rates (WERs) than the state-of-the-art joint-learning approaches.


 DOI: 10.21437/Interspeech.2019-2078

Cite as: Zhao, S., Ni, C., Tong, R., Ma, B. (2019) Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition. Proc. Interspeech 2019, 1238-1242, DOI: 10.21437/Interspeech.2019-2078.


@inproceedings{Zhao2019,
  author={Shengkui Zhao and Chongjia Ni and Rong Tong and Bin Ma},
  title={{Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={1238--1242},
  doi={10.21437/Interspeech.2019-2078},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2078}
}