8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Construction of a Phonotactic Dialect Corpus Using Semiautomatic Annotation

Reva Schwartz (1), Wade Shen (2), Joseph Campbell (2), Shelley Paget (3), Julie Vonwiller (3), Dominique Estival (3), Christopher Cieri (4)

(1) United States Secret Service, USA
(2) MIT, USA
(3) Appen Pty. Ltd., Australia
(4) Linguistic Data Consortium, USA

In this paper, we discuss rapid, semiautomatic annotation techniques of detailed phonological phenomena for large corpora. We describe the use of these techniques for the development of a corpus of American English dialects. The resulting annotations and corpora will support both large-scale linguistic dialect analysis and automatic dialect identification. We delineate the semiautomatic annotation process that we are currently employing and, a set of experiments we ran to validate this process. From these experiments, we learned that the use of ASR techniques could significantly increase the throughput and consistency of human annotators.

Full Paper

Bibliographic reference.  Schwartz, Reva / Shen, Wade / Campbell, Joseph / Paget, Shelley / Vonwiller, Julie / Estival, Dominique / Cieri, Christopher (2007): "Construction of a phonotactic dialect corpus using semiautomatic annotation", In INTERSPEECH-2007, 942-945.