ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Delta-melspectra features for noise robustness to DNN-based ASR systems

Kshitiz Kumar, Chaojun Liu, Yifan Gong

Deep-neural-networks (DNNs) have significantly improved automatic speech recognition (ASR) accuracy over a range of speech scenarios. However noise-robustness is still a challenge to DNNs, where compared to clean, accuracy degrades significantly for noisy environments. Many of the current DNN-based ASR engines use log-MelSpectra features, along with features from temporal-difference in delta and delta-delta features. In this work we introduce delta-MelSpectra features to seek significant gains for DNNs in noisy environments, where we demonstrate that temporal-difference directly in MelSpectra domain can provide superior noise-robust features. We validate our delta-MelSpectra features over a multistyle trained DNN-ASR system; we tested on a large scale WindowsPhone client data, and obtained 17% and 12% relative reduction in word-error-rate (WER) for noisy and clean environments, respectively.

doi: 10.21437/Interspeech.2015-528

Cite as: Kumar, K., Liu, C., Gong, Y. (2015) Delta-melspectra features for noise robustness to DNN-based ASR systems. Proc. Interspeech 2015, 2445-2448, doi: 10.21437/Interspeech.2015-528

  author={Kshitiz Kumar and Chaojun Liu and Yifan Gong},
  title={{Delta-melspectra features for noise robustness to DNN-based ASR systems}},
  booktitle={Proc. Interspeech 2015},