Third Workshop on Spoken Language Technologies for Under-resourced Languages

Cape Town, South Africa
May 7-9, 2012

Speech Recognition for East Slavic Languages: The Case of Russian

Alexey Karpov (1), Irina Kipyatkova (2), Andrey Ronzhin (1)

(1) Saint-Petersburg State University, Department of Phonetics, Russia
(2) SPIIRAS Institute, Speech and Multimodal Interfaces Laboratory, Russia

In this paper, we present a survey of state-of-the-art systems for automatic processing of recognition of under-resourced languages of the Eastern Europe, in particular, East Slavic languages (Ukrainian, Belarusian and Russian), which share some common prominent features including Cyrillic alphabet, phonetic classes, morphological structure of wordforms and relatively free grammar. A large vocabulary Russian speech recognizer, developed by SPIIRAS, is described in the paper and especial attention is paid to grapheme-to-phoneme conversion for automatic creation of a pronunciation vocabulary and acoustic modeling at the system training stage. Speech recognition results for a very large vocabulary above 200K word-forms are reported.

Index Terms: Slavic languages, automatic speech recognition (ASR), Russian language, pronunciation vocabulary, grapheme-to-phoneme conversion

Full Paper

Bibliographic reference.  Karpov, Alexey / Kipyatkova, Irina / Ronzhin, Andrey (2012): "Speech recognition for east Slavic languages: the case of Russian", In SLTU-2012, 84-89.