Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make

Meredith Moore, Michael Saxon, Hemanth Venkateswara, Visar Berisha, Sethuraman Panchanathan


We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.


 DOI: 10.21437/Interspeech.2019-3096

Cite as: Moore, M., Saxon, M., Venkateswara, H., Berisha, V., Panchanathan, S. (2019) Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make. Proc. Interspeech 2019, 2528-2532, DOI: 10.21437/Interspeech.2019-3096.


@inproceedings{Moore2019,
  author={Meredith Moore and Michael Saxon and Hemanth Venkateswara and Visar Berisha and Sethuraman Panchanathan},
  title={{Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2528--2532},
  doi={10.21437/Interspeech.2019-3096},
  url={http://dx.doi.org/10.21437/Interspeech.2019-3096}
}