8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Using Phonetic Features in Unsupervised Word Decompounding for ASR with Application to a Less-Represented Language

Thomas Pellegrini, Lori Lamel

LIMSI, France

In this paper, a data-driven word decompounding algorithm is described and applied to a broadcast news corpus in Amharic. The baseline algorithm has been enhanced in order to address the problem of increased phonetic confusability arising from word decompounding by incorporating phonetic properties and some constraints on recognition units derived from prior forced alignment experiments. Speech recognition experiments have been carried out to validate the approach. Out of vocabulary (OOV) words rates can be reduced by 30% to 40% and an absolute Word Error Rate (WER) reduction of 0.4% has been achieved. The algorithm is relatively language independent and requires minimal adaptation to be applied to other languages.

Full Paper

Bibliographic reference.  Pellegrini, Thomas / Lamel, Lori (2007): "Using phonetic features in unsupervised word decompounding for ASR with application to a less-represented language", In INTERSPEECH-2007, 1797-1800.