Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Review of Statistical Modeling of Highly Inflected Lithuanian Using Very Large Vocabulary

Airenas Vaiciunas, Gailius Raskinis

Vytautas Magnus University, Lithuania

This paper presents state of the art language modeling (LM) of Lithuanian, which is highly inflected free word order language. Perplexities and word error rates (WER) of standard n-gram, class-based, cache-based, topic mixture and morphological LMs were estimated and compared for the vocabulary of more than 1 million words. WER estimates were obtained by solving a speakerdependent ASR task where LMs were used to rescore acoustical hypothesis. LM perplexity appeared to be uncorrelated with WER. Cache-based language models resulted in the greatest perplexity improvement, while class-based language models achieved the greatest though insignificant WER improvement over the baseline 3-gram.

Full Paper

Bibliographic reference.  Vaiciunas, Airenas / Raskinis, Gailius (2005): "Review of statistical modeling of highly inflected lithuanian using very large vocabulary", In INTERSPEECH-2005, 1321-1324.