Second International Conference on Spoken Language Processing (ICSLP'92)

Banff, Alberta, Canada
October 13-16, 1992

Evaluating Interactive Spoken Language Systems

David Goodine, Lynette Hirschman, Joseph Polifroni, Stephanie Seneff, Victor Zue

Spoken Language Systems Group, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

As the DARPA spoken language community moves towards developing useful systems for interactive problem solving, we must develop new evaluation metrics to assess whether these systems aid people in solving problems. In this paper, we report on experiments with two new metrics: task completion and logfile evaluation (where human evaluators judge query correctness). In one experiment, we used two variants of our data collection system (with a human transcriber) to compare an aggressive system using robust parsing to a more cautious "full-parse" system. In a second experiment, we compared a system using the human transcriber to a fully automated system using the speech recognizer. There were clear differences in task completion, time to task completion, and number of correct and incorrect answers. These experiments lead us to conclude that task completion and logfile evaluation are useful metrics for evaluating interactive systems.

Full Paper

Bibliographic reference.  Goodine, David / Hirschman, Lynette / Polifroni, Joseph / Seneff, Stephanie / Zue, Victor (1992): "Evaluating interactive spoken language systems", In ICSLP-1992, 201-204.