ISCA Archive SLTU 2014
ISCA Archive SLTU 2014

Community-based resource building and data collection

Kristiina Jokinen, Graham Wilcock

The paper describes our work on participatory and community-based resource collection for the Sami language. This includes community events where participants wrote new Sami Wikipedia articles and took part in speech data collection by reading aloud Sami Wikipedia articles and discussing freely in group conversations. The aim was to increase the number of Sami Wikipedia articles and thereby strengthen Wikipedia as a digital resource for the Sami language and to collect speech data to be used in developing Sami speech components. Such components are intended to be combined with the Sami Wikipedia in order to build a spoken interactive knowledge access system.

Index Terms: language resources development, Wikipedia, Sami language, community-based participatory data collection


Cite as: Jokinen, K., Wilcock, G. (2014) Community-based resource building and data collection. Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2014), 201-206

@inproceedings{jokinen14_sltu,
  author={Kristiina Jokinen and Graham Wilcock},
  title={{Community-based resource building and data collection}},
  year=2014,
  booktitle={Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages  (SLTU 2014)},
  pages={201--206}
}