16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Multi-Task Learning Deep Neural Networks for Speech Feature Denoising

Bin Huang (1), Dengfeng Ke (2), Hao Zheng (2), Bo Xu (2), Yanyan Xu (1), Kaile Su (3)

(1) Beijing Forestry University, China
(2) Chinese Academy of Sciences, China
(3) Griffith University, Australia

Traditional automatic speech recognition (ASR) systems usually get a sharp performance drop when noise presents in speech. To make a robust ASR, we introduce a new model using the multi-task learning deep neural networks (MTL-DNN) to solve the speech denoising task in feature level. In this model, the networks are initialized by pre-training restricted Boltzmann machines (RBM) and fine-tuned by jointly learning multiple interactive tasks using a shared representation. In multi-task learning, we choose a noisy-clean speech pair fitting task as the primary task and separately explore two constraints as the secondary tasks: phone label and phone cluster. In experiments, the denoised speech is reconstructed by the MTL-DNN using the noisy speech as input and it is respectively evaluated by the DNN-hidden Markov model (HMM) based and the Gaussian Mixture Model (GMM)-HMM based ASR systems. Results show that, using the denoised speech, the word error rate (WER) is respectively reduced by 53.14% and 34.84% compared with baselines. The MTL-DNN model also outperforms the general single-task learning deep neural networks (STL-DNN) model with a performance improvement of 4.93% and 3.88% respectively.

Full Paper

Bibliographic reference.  Huang, Bin / Ke, Dengfeng / Zheng, Hao / Xu, Bo / Xu, Yanyan / Su, Kaile (2015): "Multi-task learning deep neural networks for speech feature denoising", In INTERSPEECH-2015, 2464-2468.