Fourth International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2014)

St. Petersburg, Russia
May 14-16, 2014

The Development of New Corpora for Under-Resourced Languages using Data Available for Well-Resourced Ones

Pavel Skrelin (1), Nina Volskaya (1), Karina Evgrafova (1), Riikka Ullakonoja (2)

(1) Saint-Petersburg State University, Russia
(2) University of Jyväskylä, Finland

In the paper we propose to exploit existing corpora of wellresourced languages as a basis for developing similar corpora of under-resourced ones. The construction of this type of corpora will allow finding common patterns of acoustic manifestation of similar functional states regardless of the language. The analysis of these corpora will also allow investigating universal and language-specific features reflected in speech. Two pilot experiments which may contribute to the proposed strategy are presented.

Index Terms: under-resourced languages, parallel speech corpora, acoustics, intonation

Full Paper

Bibliographic reference.  Skrelin, Pavel / Volskaya, Nina / Evgrafova, Karina / Ullakonoja, Riikka (2014): "The development of new corpora for under-resourced languages using data available for well-resourced ones", In SLTU-2014, 243-246.