ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Automatic human utility evaluation of ASR systems: does WER really predict performance?

Benoit Favre, Kyla Cheung, Siavash Kazemian, Adam Lee, Yang Liu, Cosmin Munteanu, Ani Nenkova, Dennis Ochei, Gerald Penn, Stephen Tratz, Clare Voss, Frauke Zeller

We propose an alternative evaluation metric to Word Error Rate (WER) for the decision audit task of meeting recordings, which exemplifies how to evaluate speech recognition within a legitimate application context. Using machine learning on an initial seed of human-subject experimental data, our alternative metric handily outperforms WER, which correlates very poorly with human subjects¬Āf success in finding decisions given ASR transcripts with a range of WERs.


doi: 10.21437/Interspeech.2013-610

Cite as: Favre, B., Cheung, K., Kazemian, S., Lee, A., Liu, Y., Munteanu, C., Nenkova, A., Ochei, D., Penn, G., Tratz, S., Voss, C., Zeller, F. (2013) Automatic human utility evaluation of ASR systems: does WER really predict performance? Proc. Interspeech 2013, 3463-3467, doi: 10.21437/Interspeech.2013-610

@inproceedings{favre13_interspeech,
  author={Benoit Favre and Kyla Cheung and Siavash Kazemian and Adam Lee and Yang Liu and Cosmin Munteanu and Ani Nenkova and Dennis Ochei and Gerald Penn and Stephen Tratz and Clare Voss and Frauke Zeller},
  title={{Automatic human utility evaluation of ASR systems: does WER really predict performance?}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3463--3467},
  doi={10.21437/Interspeech.2013-610}
}