12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis

Alexey Karpov, Irina Kipyatkova, Andrey Ronzhin

Russian Academy of Sciences, Russia

In this paper, we present a word-based very large vocabulary automatic speech recognition system for Russian. Some novel methods are proposed for organization of the lexicon and the language model. Two-level morpho-phonemic prefix graph that uses some information on morphemic structure of lexical units is suggested for a compact representation of the pronunciation vocabulary and search space. Such model is more compact than the lexical tree or the linearly-based vocabulary and provides speeding up the recognition process. The syntactic analysis of a training text corpus in a combination with the statistical analysis is suggested for generation of N-gram language models. The syntax-based Russian language model allows taking into account long-distance syntactic dependencies between word pairs. The results have proved that the syntactic-statistic language model gives 5% relative improvement on the word and letter error rates with respect to the baseline models.

Full Paper

Bibliographic reference.  Karpov, Alexey / Kipyatkova, Irina / Ronzhin, Andrey (2011): "Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis", In INTERSPEECH-2011, 3161-3164.