Interspeech'2005 - Eurospeech
This paper presents state of the art language modeling (LM) of Lithuanian, which is highly inflected free word order language. Perplexities and word error rates (WER) of standard n-gram, class-based, cache-based, topic mixture and morphological LMs were estimated and compared for the vocabulary of more than 1 million words. WER estimates were obtained by solving a speakerdependent ASR task where LMs were used to rescore acoustical hypothesis. LM perplexity appeared to be uncorrelated with WER. Cache-based language models resulted in the greatest perplexity improvement, while class-based language models achieved the greatest though insignificant WER improvement over the baseline 3-gram.
Bibliographic reference. Vaiciunas, Airenas / Raskinis, Gailius (2005): "Review of statistical modeling of highly inflected lithuanian using very large vocabulary", In INTERSPEECH-2005, 1321-1324.