8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Using Statistical Language Modelling to Identify New Vocabulary in a Grammar-Based Speech Recognition System

Genevieve Gorrell

Linkoping University, Sweden

Spoken language recognition meets with difficulties when an unknown word is encountered. In addition to the new word being unrecognisable, its presence impacts on recognition performance on the surrounding words. The possibility is explored here of using a back-off statistical recogniser to allow recognition of out-of-vocabulary words in a grammar-based speech recognition system. This study shows that a statistical language model created from a corpus obtained using a grammar-based system and augmented with minimally-constrained domain-appropriate material allows extraction of words that are out of the vocabulary of the grammar in an unseen corpus with fairly high precision.

Full Paper

Bibliographic reference.  Gorrell, Genevieve (2003): "Using statistical language modelling to identify new vocabulary in a grammar-based speech recognition system", In EUROSPEECH-2003, 2729-2732.