8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Handling OOV Words in Arabic ASR via Flexible Morphological Constraints

Nguyen Bach, Mohamed Noamany, Ian Lane, Tanja Schultz

Carnegie Mellon University, USA

We propose a novel framework to detect and recognize out-of-vocabulary (OOV) words in automated speech recognition (ASR). In the proposed framework a hybrid language model combining words and sub-word units is incorporated during ASR decoding then three different OOV words recognition methods are applied to generate OOV word hypotheses. Specifically, dictionary lookup, morphological composition, and direct phoneme-to-grapheme. The proposed approach successfully reduced WER by 1.9% and 1.6% for ASR systems with recognition vocabularies of 30K and 219K. Moreover, the proposed approach correctly recognized 5% of OOV words.

Full Paper

Bibliographic reference.  Bach, Nguyen / Noamany, Mohamed / Lane, Ian / Schultz, Tanja (2007): "Handling OOV words in Arabic ASR via flexible morphological constraints", In INTERSPEECH-2007, 2373-2376.