10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Transcribing Human-Directed Speech for Spoken Language Processing

Mari Ostendorf

University of Washington, USA

As storage costs drop and bandwidth increases, there has been a rapid growth of spoken information available via the web or in online archives, raising problems of document retrieval, information extraction, summarization and translation for spoken language. While there is a long tradition of research in these technologies for text, new challenges arise when moving from written to spoken language. In this talk, we look at differences between speech and text, and how we can leverage the information in the speech signal beyond the words to provide structural information in a rich, automatically generated transcript that better serves language processing applications. In particular, we look at three interrelated types of structure (orthographic, prosodic, and syntactic), methods for automatic detection, the benefit of optimizing rich transcription for the target language processing task, and the impact of this structural information in tasks such as information extraction, translation, and summarization.

Full Paper

Bibliographic reference.  Ostendorf, Mari (2009): "Transcribing human-directed speech for spoken language processing", In INTERSPEECH-2009, 21-27.