ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Multi-task learning deep neural networks for speech feature denoising

Bin Huang, Dengfeng Ke, Hao Zheng, Bo Xu, Yanyan Xu, Kaile Su

Traditional automatic speech recognition (ASR) systems usually get a sharp performance drop when noise presents in speech. To make a robust ASR, we introduce a new model using the multi-task learning deep neural networks (MTL-DNN) to solve the speech denoising task in feature level. In this model, the networks are initialized by pre-training restricted Boltzmann machines (RBM) and fine-tuned by jointly learning multiple interactive tasks using a shared representation. In multi-task learning, we choose a noisy-clean speech pair fitting task as the primary task and separately explore two constraints as the secondary tasks: phone label and phone cluster. In experiments, the denoised speech is reconstructed by the MTL-DNN using the noisy speech as input and it is respectively evaluated by the DNN-hidden Markov model (HMM) based and the Gaussian Mixture Model (GMM)-HMM based ASR systems. Results show that, using the denoised speech, the word error rate (WER) is respectively reduced by 53.14% and 34.84% compared with baselines. The MTL-DNN model also outperforms the general single-task learning deep neural networks (STL-DNN) model with a performance improvement of 4.93% and 3.88% respectively.


doi: 10.21437/Interspeech.2015-532

Cite as: Huang, B., Ke, D., Zheng, H., Xu, B., Xu, Y., Su, K. (2015) Multi-task learning deep neural networks for speech feature denoising. Proc. Interspeech 2015, 2464-2468, doi: 10.21437/Interspeech.2015-532

@inproceedings{huang15e_interspeech,
  author={Bin Huang and Dengfeng Ke and Hao Zheng and Bo Xu and Yanyan Xu and Kaile Su},
  title={{Multi-task learning deep neural networks for speech feature denoising}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={2464--2468},
  doi={10.21437/Interspeech.2015-532}
}