Why do ASR Systems Despite Neural Nets Still Depend on Robust Features

Angel Mario Castro Martinez, Marc René Schädler


To which extent can neural nets learn traditional signal processing stages of current robust ASR front-ends? Will neural nets replace the classical, often auditory-inspired feature extraction in the near future? To answer these questions, a DNN-based ASR system was trained and tested on the Aurora4 robust ASR task using various (intermediate) processing stages. Additionally, the training set was divided into several fractions to reveal the amount of data needed to account for a missing processing step on the input signal or prior knowledge about the auditory system. The DNN system was able to learn from ordinary spectrograms representations outperforming MFCC using 75% of the training set and almost as good as log-Mel-spectrograms with the full set; on the other hand, it was unable to compensate the robustness of auditory-based Gabor features, which even using 40% of the training data outperformed every other representation. The study concludes that even with deep learning approaches, current ASR systems still benefit from a suitable feature extraction.


DOI: 10.21437/Interspeech.2016-1552

Cite as

Martinez, A.M.C., Schädler, M.R. (2016) Why do ASR Systems Despite Neural Nets Still Depend on Robust Features. Proc. Interspeech 2016, 1883-1887.

Bibtex
@inproceedings{Martinez+2016,
author={Angel Mario Castro Martinez and Marc René Schädler},
title={Why do ASR Systems Despite Neural Nets Still Depend on Robust Features},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1552},
url={http://dx.doi.org/10.21437/Interspeech.2016-1552},
pages={1883--1887}
}