Speech Prosody 2004

Nara, Japan
March 23-26, 2004

Querying Annotated Speech Corpora

Ulrike Gut (1), Jan-Torsten Milde (2), Holger Voormann (3), Ulrich Heid (3)

(1) University of Freiburg, Germany; (2) Polytechnical University Aalen, Germany; (3) IMS, University of Stuttgart, Germany

This paper is concerned with querying annotated speech corpora. A growing number of such corpora is currently being created worldwide; however, their usefulness for a wider research community is restricted by the lack of standard tools for creating, editing, annotating, storing and querying them. Two solutions for these problems are presented here: the XML-based data format TASX for corpus creation and data format exchange and the NXT search tool for querying corpora. Both tools have been applied to the multi-level annotated LeaP corpus of non-native speech.

