8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Shared Resources for Robust Speech-to-Text Technology

Stephanie Strassel, David Miller, Kevin Walker, Christopher Cieri

University of Pennsylvania, USA

This paper describes ongoing efforts at Linguistic Data Consortium to create shared resources for improved speech-to-text technology. Under the DARPA EARS program, technology providers are charged with creating STT systems whose outputs are substantially richer and much more accurate than is currently possible. These aggressive program goals motivate new approaches to corpus creation and distribution. EARS participants require multilingual broadcast and telephone speech data, transcripts and annotations at a much higher volume than for any previous program. While standard approaches to resource collection and creation are prohibitively expensive for this volume of material, within EARS new methods have been established to allow for the development of vast quantities of audio, transcripts and annotations. New distribution methods also provide for efficient deployment of needed resources to participating research sites as well as enabling eventual publication to a wider community of language researchers.

Full Paper

Bibliographic reference.  Strassel, Stephanie / Miller, David / Walker, Kevin / Cieri, Christopher (2003): "Shared resources for robust speech-to-text technology", In EUROSPEECH-2003, 1609-1612.