INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

OOV Detection and Recovery Using Hybrid Models with Different Fragments

Long Qin, Ming Sun, Alexander Rudnicky

Carnegie Mellon University, USA

In this paper, we address the out-of-vocabulary (OOV) detection and recovery problem by developing three different fragment-word hybrid systems. A fragment language model (LM) and a word LM were trained separately and then combined into a single hybrid LM. Using this hybrid model, the recognizer can recognize any OOVs as fragment sequences. Different types of fragments, such as phones, subwords, and graphones were tested and compared on the WSJ 5k and 20k evaluation sets. The experiment results show that the subword and graphone hybrid systems perform better than the phone hybrid system in both 5k and 20k tasks. Furthermore, given less training data, the subword hybrid system is more preferable than the graphone hybrid system.

Full Paper

Bibliographic reference.  Qin, Long / Sun, Ming / Rudnicky, Alexander (2011): "OOV detection and recovery using hybrid models with different fragments", In INTERSPEECH-2011, 1913-1916.