Optimizing DNN Adaptation for Recognition of Enhanced Speech

Marco Matassoni, Alessio Brutti, Daniele Falavigna

Speech enhancement directly using deep neural network (DNN) is of major interest due to the capability of DNN to tangibly reduce the impact of noisy conditions in speech recognition tasks. Similarly, DNN based acoustic model adaptation to new environmental conditions is another challenging topic. In this paper we present an analysis of acoustic model adaptation in presence of a disjoint speech enhancement component, identifying an optimal setting for improving the speech recognition performance. Adaptation is derived from a consolidated technique that introduces in the training process a regularization term to prevent overfitting. We propose to optimize the adaptation of the clean acoustic models towards the enhanced speech by tuning the regularization term based on the degree of enhancement. Experiments on a popular noisy dataset (e.g., AURORA-4) demonstrate the validity of the proposed approach.

 DOI: 10.21437/Interspeech.2017-755

Cite as: Matassoni, M., Brutti, A., Falavigna, D. (2017) Optimizing DNN Adaptation for Recognition of Enhanced Speech. Proc. Interspeech 2017, 724-728, DOI: 10.21437/Interspeech.2017-755.

  author={Marco Matassoni and Alessio Brutti and Daniele Falavigna},
  title={Optimizing DNN Adaptation for Recognition of Enhanced Speech},
  booktitle={Proc. Interspeech 2017},