10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Combined Low Level and High Level Features for Out-of-Vocabulary Word Detection

Benjamin Lecouteux (1), Georges Linarès (1), Benoit Favre (2)

(1) LIA, France

This paper addresses the issue of Out-Of-Vocabulary (OOV) word detection in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We propose a method inspired by confidence measures, that consists in analyzing the recognition system outputs in order to automatically detect errors due to OOV words. This method combines various features based on acoustic, linguistic, decoding graph and semantics. We evaluate separately each feature and we estimate their complementarity. Experiments are conducted on a large French broadcast news corpus from the ESTER evaluation campaign. Results show good performance in real conditions: the method obtains an OOV word detection rate of 43%90% with 2.5%17.5% of false detection.

Full Paper

Bibliographic reference.  Lecouteux, Benjamin / Linarès, Georges / Favre, Benoit (2009): "Combined low level and high level features for out-of-vocabulary word detection", In INTERSPEECH-2009, 1187-1190.