2nd Workshop on Spoken Language Technologies for Under-Resourced Languages

Universiti Sains, Penang, Malaysia
May 3-5, 2010

Morpheme-Based Automatic Speech Recognition for a Morphologically Rich Language - Amharic

Martha Yifiru Tachbelie, Solomon Teferra Abate, Wolfgang Menzel

Department of Informatics, University of Hamburg, Germany

Out-of-vocabulary (OOV) words are a major source of error in a speech recognition system and various methods have been proposed to increase the performance of the systems by properly dealing with them. This paper presents an automatic speech recognition experiment conducted to see the effect of OOV words on the performance speech recognition system for Amharic (a morphologically rich language). We tried to solve the OOV problem by using morphemes as dictionary and language model units. It has been found that for a small vocabulary (5k) system morphemes are better lexical and language modeling units than words. An absolute improvement (in word recognition accuracy) of 11.57% has been obtained as a result of using a morph-based vocabulary. However, for large vocabularies morpheme-based systems did not bring much performance improvement as they suffer from acoustic confusability and limited language model scope while wordbased recognizers benefit much from OOV rate reduction.

Index Terms: Out-of-Vocabulary problem, Morphemebased speech recognition, Amharic

Full Paper

Bibliographic reference.  Tachbelie, Martha Yifiru / Abate, Solomon Teferra / Menzel, Wolfgang (2010): "Morpheme-based automatic speech recognition for a morphologically rich language - Amharic", In SLTU-2010, 68-73.