Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Diachronic Vocabulary Adaptation for Broadcast News Transcription

Alexandre Allauzen, Jean-Luc Gauvain

LIMSI-CNRS, Orsay, France

This article investigates the use of Internet news sources to automatically adapt the vocabulary of a French and an English broadcast news transcription system. A specific method is developed to gather training, development and test corpora from selected websites, normalizing them for further use. A vectorial vocabulary adaptation algorithm is described which interpolates word frequencies estimated on adaptation corpora to directly maximize lexical coverage on a development corpus. To test the generality of this approach, experiments were carried out simultaneously in French and in English (UK) on a daily basis for the month May 2004. In both languages, the OOV rate is reduced by more than a half.

Full Paper

Bibliographic reference.  Allauzen, Alexandre / Gauvain, Jean-Luc (2005): "Diachronic vocabulary adaptation for broadcast news transcription", In INTERSPEECH-2005, 1305-1308.