![]() |
Third Workshop on Spoken Language Technologies for Under-resourced LanguagesCape Town, South Africa |
![]() |
In this paper, we present a survey of state-of-the-art systems for automatic processing of recognition of under-resourced languages of the Eastern Europe, in particular, East Slavic languages (Ukrainian, Belarusian and Russian), which share some common prominent features including Cyrillic alphabet, phonetic classes, morphological structure of wordforms and relatively free grammar. A large vocabulary Russian speech recognizer, developed by SPIIRAS, is described in the paper and especial attention is paid to grapheme-to-phoneme conversion for automatic creation of a pronunciation vocabulary and acoustic modeling at the system training stage. Speech recognition results for a very large vocabulary above 200K word-forms are reported.
Index Terms: Slavic languages, automatic speech recognition (ASR), Russian language, pronunciation vocabulary, grapheme-to-phoneme conversion
Bibliographic reference. Karpov, Alexey / Kipyatkova, Irina / Ronzhin, Andrey (2012): "Speech recognition for east Slavic languages: the case of Russian", In SLTU-2012, 84-89.