Deep-neural-networks (DNNs) have significantly improved automatic speech recognition (ASR) accuracy over a range of speech scenarios. However noise-robustness is still a challenge to DNNs, where compared to clean, accuracy degrades significantly for noisy environments. Many of the current DNN-based ASR engines use log-MelSpectra features, along with features from temporal-difference in delta and delta-delta features. In this work we introduce delta-MelSpectra features to seek significant gains for DNNs in noisy environments, where we demonstrate that temporal-difference directly in MelSpectra domain can provide superior noise-robust features. We validate our delta-MelSpectra features over a multistyle trained DNN-ASR system; we tested on a large scale WindowsPhone client data, and obtained 17% and 12% relative reduction in word-error-rate (WER) for noisy and clean environments, respectively.
Cite as: Kumar, K., Liu, C., Gong, Y. (2015) Delta-melspectra features for noise robustness to DNN-based ASR systems. Proc. Interspeech 2015, 2445-2448, doi: 10.21437/Interspeech.2015-528
@inproceedings{kumar15e_interspeech, author={Kshitiz Kumar and Chaojun Liu and Yifan Gong}, title={{Delta-melspectra features for noise robustness to DNN-based ASR systems}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={2445--2448}, doi={10.21437/Interspeech.2015-528} }