Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Improving Out-of-Coverage Language Modelling in a Multimodal Dialogue System Using Small Training Sets

Louis ten Bosch

Radboud Universiteit Nijmegen, The Netherlands

For automatic speech recognition, the construction of an adequate language model may be difficult when only a limited amount of training text is available. Previous work has shown that in the case of small training sets statistical language models may outperform grammars on out-of-coverage utterances, while showing comparable performance on in-coverage input. In this paper, we compare the performance of an automatic speech recognition system using a grammar and a statistical language model including garbage models in the case of very limited in-domain training data. The results show that a bigram language model and a grammar show similar performance, and that the inclusion of garbage models in statistical language models enhances their performance both on in-coverage and out-of-coverage utterances.

Full Paper

Bibliographic reference.  Bosch, Louis ten (2005): "Improving out-of-coverage language modelling in a multimodal dialogue system using small training sets", In INTERSPEECH-2005, 905-908.