Structural metadata annotation: moving beyond English

Stephanie Strassel, Jáchym Kolár, Zhiyi Song, Leila Barclay, Meghan Glenn

The goal of metadata extraction (MDE) is to enable technology that can take raw speech-to-text output and refine it into forms that are more useful to humans and to downstream automatic processes. Starting in 2003, a structural metadata annotation task was defined for English as part of the DARPA EARS Program. A significant new challenge for MDE is the addition of new languages. This paper reports on work undertaken to apply MDE annotation to data from three very different languages: Mandarin Chinese, Levantine Arabic, and conversational Czech. Details of annotation task modifications are provided for each language; along with a general overview of data and annotation tools for non-English MDE.

doi: 10.21437/Interspeech.2005-453

Cite as: Strassel, S., Kolár, J., Song, Z., Barclay, L., Glenn, M. (2005) Structural metadata annotation: moving beyond English. Proc. Interspeech 2005, 1545-1548, doi: 10.21437/Interspeech.2005-453

  author={Stephanie Strassel and Jáchym Kolár and Zhiyi Song and Leila Barclay and Meghan Glenn},
  title={{Structural metadata annotation: moving beyond English}},
  booktitle={Proc. Interspeech 2005},