This paper studies the use of hybrid context-dependent Deep Neural Network Hidden Markov Model (DNN-HMM) architecture for robust recognition of speech affected by real-world nonlinear distortions. We consider two types of distortions; a) signals distorted through overgained microphone preamplifier in the analog domain and b) recordings exhibiting unnatural spectral sparseness, caused by excessive denoising or low-bit-rate compression. We compare the performance of DNN-HMM architecture with that of the conventional system, based on context-dependent Gaussian Mixture Model (GMM)-HMMs, which applies channel/speaker adaptation and/or feature compensation in the front-end via Histogram Equalization (HEQ). We show that DNN-HMM architecture achieves a significantly lower Word Error Rate (WER) on the considered distorted datasets and that the obtained relative WER reduction is higher than 60%. We also investigate the usefulness of the feature compensation via HEQ for a DNN-HMM system and show that it can be helpful in the case of shallower networks.
Bibliographic reference. Seps, Ladislav / Malek, Jiri / Cerva, Petr / Nouza, Jan (2014): "Investigation of deep neural networks for robust recognition of nonlinearly distorted speech", In INTERSPEECH-2014, 363-367.