Robustness Over Time-Varying Channels in DNN-HMM ASR Based Human-Robot Interaction

José Novoa, Jorge Wuth, Juan Pablo Escudero, Josué Fredes, Rodrigo Mahu, Richard M. Stern, Nestor Becerra Yoma


This paper addresses the problem of time-varying channels in speech-recognition-based human-robot interaction using Locally-Normalized Filter-Bank features (LNFB), and training strategies that compensate for microphone response and room acoustics. Testing utterances were generated by re-recording the Aurora-4 testing database using a PR2 mobile robot, equipped with a Kinect audio interface while performing head rotations and movements toward and away from a fixed source. Three training conditions were evaluated called Clean, 1-IR and 33-IR. With Clean training, the DNN-HMM system was trained using the Aurora-4 clean training database. With 1-IR training, the same training data were convolved with an impulse response estimated at one meter from the source with no rotation of the robot head. With 33-IR training, the Aurora-4 training data were convolved with impulse responses estimated at one, two and three meters from the source and 11 angular positions of the robot head. The 33-IR training method produced reductions in WER greater than 50% when compared with Clean training using both LNFB and conventional Mel filterbank features. Nevertheless, LNFB features provided a WER 23% lower than MelFB using 33-IR training. The use of 33-IR training and LNFB features reduced WER by 64% compared to Clean training and MelFB features.


 DOI: 10.21437/Interspeech.2017-1308

Cite as: Novoa, J., Wuth, J., Escudero, J.P., Fredes, J., Mahu, R., Stern, R.M., Yoma, N.B. (2017) Robustness Over Time-Varying Channels in DNN-HMM ASR Based Human-Robot Interaction. Proc. Interspeech 2017, 839-843, DOI: 10.21437/Interspeech.2017-1308.


@inproceedings{Novoa2017,
  author={José Novoa and Jorge Wuth and Juan Pablo Escudero and Josué Fredes and Rodrigo Mahu and Richard M. Stern and Nestor Becerra Yoma},
  title={Robustness Over Time-Varying Channels in DNN-HMM ASR Based Human-Robot Interaction},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={839--843},
  doi={10.21437/Interspeech.2017-1308},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1308}
}