10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

The Broadcast Narrow Band Speech Corpus: A New Resource Type for Large Scale Language Recognition

Christopher Cieri (1), Linda Brandschain (1), Abby Neely (1), David Graff (1), Kevin Walker (1), Chris Caruso (1), Alvin F. Martin (2), Craig S. Greenberg (2)

(1) University of Pennsylvania, USA

This paper describes a new resource type, broadcast narrow band speech for use in large scale language recognition research and technology development. After providing the rational for this new resource type, the paper describes the collection, segmentation, auditing procedures and data formats used. Along the way, it addresses issues of defining language and dialect in found data and how ground truth is established for this corpus.

Full Paper

Bibliographic reference.  Cieri, Christopher / Brandschain, Linda / Neely, Abby / Graff, David / Walker, Kevin / Caruso, Chris / Martin, Alvin F. / Greenberg, Craig S. (2009): "The broadcast narrow band speech corpus: a new resource type for large scale language recognition", In INTERSPEECH-2009, 2867-2870.