Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions

Samik Sadhu, Hynek Hermansky


In this work, we demonstrate the robustness of Modulation Vectors, in domain mismatches between the training and test conditions in an Automatic Speech Recognition (ASR) system. Our work focuses on the specific task of dealing with mismatches caused by reverberation. We use simulated data from TIMIT and real reverberant speech from the REVERB challenge data to evaluate the performance of our system. The paper also describes a multistream system to combine information from Mel Frequency Cepstral Coefficient (MFCC) and M-vectors to improve the ASR performance in both matched and mismatched datasets. The proposed multistream system achieves a relative improvement of 25% in recognition accuracy on the mismatched condition, while a M-vector trained hybrid ASR system shows a 7–8% improvement in recognition accuracy, both w.r.t. a MFCC trained hybrid ASR system.


 DOI: 10.21437/Interspeech.2019-2723

Cite as: Sadhu, S., Hermansky, H. (2019) Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions. Proc. Interspeech 2019, 3441-3445, DOI: 10.21437/Interspeech.2019-2723.


@inproceedings{Sadhu2019,
  author={Samik Sadhu and Hynek Hermansky},
  title={{Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3441--3445},
  doi={10.21437/Interspeech.2019-2723},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2723}
}