Fourth International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2014)

St. Petersburg, Russia
May 14-16, 2014

Community-Based Resource Building and Data Collection

Kristiina Jokinen (1), Graham Wilcock (2)

(1) Institute of Behavioural Sciences; (2) Department of Modern Languages; University of Helsinki, Finland

The paper describes our work on participatory and community-based resource collection for the Sami language. This includes community events where participants wrote new Sami Wikipedia articles and took part in speech data collection by reading aloud Sami Wikipedia articles and discussing freely in group conversations. The aim was to increase the number of Sami Wikipedia articles and thereby strengthen Wikipedia as a digital resource for the Sami language and to collect speech data to be used in developing Sami speech components. Such components are intended to be combined with the Sami Wikipedia in order to build a spoken interactive knowledge access system.

Index Terms: language resources development, Wikipedia, Sami language, community-based participatory data collection

Full Paper

Bibliographic reference.  Jokinen, Kristiina / Wilcock, Graham (2014): "Community-based resource building and data collection", In SLTU-2014, 201-206.