ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Unsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0

Tim Schlippe, Lukasz Gren, Ngoc Thang Vu, Tanja Schultz

We improve the automatic speech recognition of broadcast news using paradigms from Web 2.0 to obtain time- and topic-relevant text data for language modeling. We elaborate an unsupervised text collection and decoding strategy that includes crawling appropriate texts from RSS Feeds, complementing it with texts from Twitter, language model and vocabulary adaptation, as well as a 2-pass decoding. The word error rates of the tested French broadcast news shows from Europe 1 are reduced by almost 32% relative with an underlying language model from the GlobalPhone project and by almost 4% with an underlying language model from the Quaero project. The tools that we use for the text normalization, the collection of RSS Feeds together with the text on the related websites, a TF-IDF-based topic words extraction, as well as the opportunity for language model interpolation are available in our Rapid Language Adaptation Toolkit.


doi: 10.21437/Interspeech.2013-620

Cite as: Schlippe, T., Gren, L., Vu, N.T., Schultz, T. (2013) Unsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0. Proc. Interspeech 2013, 2698-2702, doi: 10.21437/Interspeech.2013-620

@inproceedings{schlippe13_interspeech,
  author={Tim Schlippe and Lukasz Gren and Ngoc Thang Vu and Tanja Schultz},
  title={{Unsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2698--2702},
  doi={10.21437/Interspeech.2013-620}
}