7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

The Carnegie Mellon Communicator Corpus

Christina Bennett, Alexander I. Rudnicky

Carnegie Mellon University, USA

As part of the DARPA Communicator program, Carnegie Mellon has, over the past three years, collected a large corpus of speech produced by callers to its Travel Planning system. To date, a total of 180,605 utterances (90.9 hours) have been collected. The data were used for a number of purposes, including acoustic and language modeling and the development of a spoken dialog system. The collection, transcription and annotation of these data prompted us to develop a number of procedures for managing the transcription process and for ensuring accuracy. We describe these, as well as some results based on these data. A portion of this corpus, covering the years 1999-2001, is being published for research purposes.


Full Paper

Bibliographic reference.  Bennett, Christina / Rudnicky, Alexander I. (2002): "The carnegie mellon communicator corpus", In ICSLP-2002, 341-344.