This paper addresses the issue of Out-Of-Vocabulary (OOV) word detection in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We propose a method inspired by confidence measures, that consists in analyzing the recognition system outputs in order to automatically detect errors due to OOV words. This method combines various features based on acoustic, linguistic, decoding graph and semantics. We evaluate separately each feature and we estimate their complementarity. Experiments are conducted on a large French broadcast news corpus from the ESTER evaluation campaign. Results show good performance in real conditions: the method obtains an OOV word detection rate of 43%–90% with 2.5%–17.5% of false detection.
Bibliographic reference. Lecouteux, Benjamin / Linarès, Georges / Favre, Benoit (2009): "Combined low level and high level features for out-of-vocabulary word detection", In INTERSPEECH-2009, 1187-1190.