ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Broadcast news LM adaptation using contemporary texts

Marcello Federico, Nicola Bertoldi

This paper investigates the problem of dynamically updating the language model (LM) of a broadcast news speech recognition system, in order to cope with language and topic changes, typical of the news domain. Statistical adaptation methods are proposed that exploit written news sources which are daily available on the Internet, i.e. newswires and newspapers. Specifically, LM adaptation is performed by extending the basic lexicon, in order to minimize the out-of-vocabulary (OOV) rate, and by adapting the word probability distribution on the contemporary data. Experiments performed on 19 newscasts showed relative reductions of 58% on the OOV rate, 16% on the perplexity, and 4% on the word error rate.


doi: 10.21437/Eurospeech.2001-82

Cite as: Federico, M., Bertoldi, N. (2001) Broadcast news LM adaptation using contemporary texts. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 239-242, doi: 10.21437/Eurospeech.2001-82

@inproceedings{federico01_eurospeech,
  author={Marcello Federico and Nicola Bertoldi},
  title={{Broadcast news LM adaptation using contemporary texts}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={239--242},
  doi={10.21437/Eurospeech.2001-82}
}