2nd Workshop on Spoken Language Technologies for Under-Resourced Languages
Universiti Sains, Penang, Malaysia
Out-of-vocabulary (OOV) words are a major source of error in a speech recognition system and various methods have been proposed to increase the performance of the systems by properly dealing with them. This paper presents an automatic speech recognition experiment conducted to see the effect of OOV words on the performance speech recognition system for Amharic (a morphologically rich language). We tried to solve the OOV problem by using morphemes as dictionary and language model units. It has been found that for a small vocabulary (5k) system morphemes are better lexical and language modeling units than words. An absolute improvement (in word recognition accuracy) of 11.57% has been obtained as a result of using a morph-based vocabulary. However, for large vocabularies morpheme-based systems did not bring much performance improvement as they suffer from acoustic confusability and limited language model scope while wordbased recognizers benefit much from OOV rate reduction.
Index Terms: Out-of-Vocabulary problem, Morphemebased speech recognition, Amharic
Bibliographic reference. Tachbelie, Martha Yifiru / Abate, Solomon Teferra / Menzel, Wolfgang (2010): "Morpheme-based automatic speech recognition for a morphologically rich language - Amharic", In SLTU-2010, 68-73.