8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Generic Class-Based Statistical Language Models for Robust Speech Understanding in Directed Dialog Applications

Matthieu Hébert

Nuance Communications, Canada

We investigate the usage of class-based statistical language models (SLMs) for robust speech understanding. Generic class-based SLMs are built using data from several applications and then tested on data from a distinct target application to benchmark their portability. The results show that these generic class-based SLMs perform as well as those trained on data from the target testing application. This leads us to conclude that, for directed dialog applications, words that do not fall within a rule (class) are generic across applications. Also, the generic class-based SLMs can be used to automatically transcribe utterances from the target application with high accuracy. These transcriptions are then used to train a word-based SLM; the resulting word-based SLM outperforms the class-based ones.

Full Paper

Bibliographic reference.  Hébert, Matthieu (2007): "Generic class-based statistical language models for robust speech understanding in directed dialog applications", In INTERSPEECH-2007, 2809-2812.