ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Transcribing human-directed speech for spoken language processing

Mari Ostendorf

As storage costs drop and bandwidth increases, there has been a rapid growth of spoken information available via the web or in online archives, raising problems of document retrieval, information extraction, summarization and translation for spoken language. While there is a long tradition of research in these technologies for text, new challenges arise when moving from written to spoken language. In this talk, we look at differences between speech and text, and how we can leverage the information in the speech signal beyond the words to provide structural information in a rich, automatically generated transcript that better serves language processing applications. In particular, we look at three interrelated types of structure (orthographic, prosodic, and syntactic), methods for automatic detection, the benefit of optimizing rich transcription for the target language processing task, and the impact of this structural information in tasks such as information extraction, translation, and summarization.


doi: 10.21437/Interspeech.2009-4

Cite as: Ostendorf, M. (2009) Transcribing human-directed speech for spoken language processing. Proc. Interspeech 2009, 21-27, doi: 10.21437/Interspeech.2009-4

@inproceedings{ostendorf09_interspeech,
  author={Mari Ostendorf},
  title={{Transcribing human-directed speech for spoken language processing}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={21--27},
  doi={10.21437/Interspeech.2009-4}
}