An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification

Xue Feng, Brigitte Richardson, Scott Amman, James Glass


In this paper we investigate environment feature representations, which we refer to as e-vectors, that can be used for environment adaption in automatic speech recognition (ASR), and for environment identification. Inspired by the fact that i-vectors in the total variability space capture both speaker and channel environment variability, our proposed e-vectors are extracted from i-vectors. Two extraction methods are proposed: one is via linear discriminant analysis (LDA) projection, and the other via a bottleneck deep neural network (BN-DNN). Our evaluations show that by augmenting DNN-HMM ASR systems with the proposed e-vectors for environment adaptation, ASR performance is significantly improved. We also demonstrate that the proposed e-vector yields promising results on environment identification.


 DOI: 10.21437/Interspeech.2017-485

Cite as: Feng, X., Richardson, B., Amman, S., Glass, J. (2017) An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification. Proc. Interspeech 2017, 3078-3082, DOI: 10.21437/Interspeech.2017-485.


@inproceedings{Feng2017,
  author={Xue Feng and Brigitte Richardson and Scott Amman and James Glass},
  title={An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3078--3082},
  doi={10.21437/Interspeech.2017-485},
  url={http://dx.doi.org/10.21437/Interspeech.2017-485}
}