Third Workshop on Spoken Language Technologies for Under-resourced Languages
Cape Town, South Africa
The purpose of the Almannarómur project is collecting data for a speech corpus (database) for Icelandic. Its main aim is creating an open source speech project to enable research and development for Icelandic language technology. The database is particularly suitable for acoustic modelling for speech recognition but it could also be used for other purposes, such as to develop a speaker recognition system or to analyze prosody. The project is run by Reykjavik University and the Icelandic Centre for Language Technology in cooperation with Google who provided technical support. The number of participants achieved in this effort was 563, providing, on average, around 219 read sentences each. This paper gives a short introduction to Icelandic language technology, describes how the text corpus was constructed for the database, and presents how the recording effort was organized as well as its main results.
Index Terms: Icelandic, Speech Recording, Corpus Creation, Automatic Speech Recognition
Bibliographic reference. Guðnason, Jón / Kjartansson, Oddur / Jóhannsson, Jökull / Carstensdóttir, Elín / Vilhjálmsson, Hannes Högni / Loftsson, Hrafn / Helgadóttir, Sigrún / Jóhannsdóttir, Kristín M. / Rögnvaldsson, Eiríkur (2012): "Almannarómur: an open icelandic speech corpus", In SLTU-2012, 80-83.