7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper describes the evaluation methodology and results of the DARPA Communicator spoken dialog system evaluation experiments in 2000 and 2001. Nine spoken dialog systems in the travel planning domain participated in the experiments resulting in a total corpus of 1904 dialogs. We describe and compare the experimental design of the 2000 and 2001 DARPA evaluations. We describe how we established a performance baseline in 2001 for complex tasks. We present our overall approach to data collection, the metrics collected, and the application of PARADISE to these data sets. We compare the results we achieved in 2000 for a number of core metrics with those for 2001. These results demonstrate large performance improvements from 2000 to 2001 and show that the Communicator program goal of conversational interaction for complex tasks has been achieved.
Bibliographic reference. Walker, Marilyn A. / Rudnicky, Alexander I. / Aberdeen, John / Bratt, Elizabeth Owen / Garofolo, John S. / Hastie, Helen / Le, Audrey N. / Pellom, Bryan / Potamianos, Alex / Passonneau, Rebecca / Prasad, Rashmi / Roukos, Salim / Sanders, Gregory A. / Seneff, Stephanie / Stallard, David (2002): "DARPA communicator evaluation: progress from 2000 to 2001", In ICSLP-2002, 273-276.