The goal of metadata extraction (MDE) is to enable technology that can take raw speech-to-text output and refine it into forms that are more useful to humans and to downstream automatic processes. Starting in 2003, a structural metadata annotation task was defined for English as part of the DARPA EARS Program. A significant new challenge for MDE is the addition of new languages. This paper reports on work undertaken to apply MDE annotation to data from three very different languages: Mandarin Chinese, Levantine Arabic, and conversational Czech. Details of annotation task modifications are provided for each language; along with a general overview of data and annotation tools for non-English MDE.
Cite as: Strassel, S., Kolár, J., Song, Z., Barclay, L., Glenn, M. (2005) Structural metadata annotation: moving beyond English. Proc. Interspeech 2005, 1545-1548, doi: 10.21437/Interspeech.2005-453
@inproceedings{strassel05_interspeech, author={Stephanie Strassel and Jáchym Kolár and Zhiyi Song and Leila Barclay and Meghan Glenn}, title={{Structural metadata annotation: moving beyond English}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1545--1548}, doi={10.21437/Interspeech.2005-453} }