7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

DARPA Communicator Evaluation: Progress from 2000 to 2001

Marilyn A. Walker (1), Alexander I. Rudnicky (2), John Aberdeen (3), Elizabeth Owen Bratt (4), John S. Garofolo (5), Helen Hastie (1), Audrey N. Le (5), Bryan Pellom (6), Alex Potamianos (7), Rebecca Passonneau (1), Rashmi Prasad (1), Salim Roukos (8), Gregory A. Sanders (5), Stephanie Seneff (9), David Stallard (10)

(1) AT&T Labs, USA; (2) Carnegie Mellon University, USA; (3) MITRE, USA; (4) SRI, USA; (5) National Institute of Standards and Technology, USA; (6) University of Colorado at Boulder, USA; (7) Lucent Technologies, USA; (8) IBM, USA; (9) MIT Laboratory for Computer Science, USA; (10) BBN Technologies, USA

This paper describes the evaluation methodology and results of the DARPA Communicator spoken dialog system evaluation experiments in 2000 and 2001. Nine spoken dialog systems in the travel planning domain participated in the experiments resulting in a total corpus of 1904 dialogs. We describe and compare the experimental design of the 2000 and 2001 DARPA evaluations. We describe how we established a performance baseline in 2001 for complex tasks. We present our overall approach to data collection, the metrics collected, and the application of PARADISE to these data sets. We compare the results we achieved in 2000 for a number of core metrics with those for 2001. These results demonstrate large performance improvements from 2000 to 2001 and show that the Communicator program goal of conversational interaction for complex tasks has been achieved.

