INTERSPEECH 2006 - ICSLP
This paper addresses the use of an automatic decomposition method to reduce lexical variety and thereby improve speech recognition of less well-represented languages. The Amharic language has been selected for these experiments since only a small quantity of resources are available compared to well-covered languages. Inspired by the Harris algorithm, the method automatically generates plausible affixes, that combined with decompounding can reduce the size of the lexicon and the OOV rate. Recognition experiments are carried out for four different configurations (full-word and decompounded) and using supervised training with a corpus containing only two hours of manually transcribed data.
Bibliographic reference. Pellegrini, Thomas / Lamel, Lori (2006): "Investigating automatic decomposition for ASR in less represented languages", In INTERSPEECH-2006, paper 1776-Mon2A2O.4.