ISCA Archive ICSLP 1992
ISCA Archive ICSLP 1992

Evaluating interactive spoken language systems

David Goodine, Lynette Hirschman, Joseph Polifroni, Stephanie Seneff, Victor Zue

As the DARPA spoken language community moves towards developing useful systems for interactive problem solving, we must develop new evaluation metrics to assess whether these systems aid people in solving problems. In this paper, we report on experiments with two new metrics: task completion and logfile evaluation (where human evaluators judge query correctness). In one experiment, we used two variants of our data collection system (with a human transcriber) to compare an aggressive system using robust parsing to a more cautious "full-parse" system. In a second experiment, we compared a system using the human transcriber to a fully automated system using the speech recognizer. There were clear differences in task completion, time to task completion, and number of correct and incorrect answers. These experiments lead us to conclude that task completion and logfile evaluation are useful metrics for evaluating interactive systems.

doi: 10.21437/ICSLP.1992-58

Cite as: Goodine, D., Hirschman, L., Polifroni, J., Seneff, S., Zue, V. (1992) Evaluating interactive spoken language systems. Proc. 2nd International Conference on Spoken Language Processing (ICSLP 1992), 201-204, doi: 10.21437/ICSLP.1992-58

  author={David Goodine and Lynette Hirschman and Joseph Polifroni and Stephanie Seneff and Victor Zue},
  title={{Evaluating interactive spoken language systems}},
  booktitle={Proc. 2nd International Conference on Spoken Language Processing (ICSLP 1992)},