Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Investigating Automatic Decomposition for ASR in Less Represented Languages

Thomas Pellegrini, Lori Lamel

LIMSI-CNRS, Orsay, France

This paper addresses the use of an automatic decomposition method to reduce lexical variety and thereby improve speech recognition of less well-represented languages. The Amharic language has been selected for these experiments since only a small quantity of resources are available compared to well-covered languages. Inspired by the Harris algorithm, the method automatically generates plausible affixes, that combined with decompounding can reduce the size of the lexicon and the OOV rate. Recognition experiments are carried out for four different configurations (full-word and decompounded) and using supervised training with a corpus containing only two hours of manually transcribed data.

Full Paper

Bibliographic reference.  Pellegrini, Thomas / Lamel, Lori (2006): "Investigating automatic decomposition for ASR in less represented languages", In INTERSPEECH-2006, paper 1776-Mon2A2O.4.