ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Where are we in transcribing French broadcast news?

Jean-Luc Gauvain, G. Adda, Martine Adda-Decker, Alexandre Allauzen, V. Gendner, Lori Lamel, Holger Schwenk

Given the high flexional properties of the French language, transcribing French broadcast news (BN) is more challenging than English BN. This is in part due to the large number of homophones in the inflected forms. This paper describes advances in automatic processing of broadcast news speech in French based on recent improvements to the LIMSI English system. The main differences between the English and French BN systems are: a 200k vocabulary to overcome the lower lexical coverage in French (including contextual pronunciations to model liaisons), a case sensitive language model, and the use of a POS based language model to lower the impact of homophonic gender and number disagreement. The resulting system was evaluated in the first French TECHNOLANGUE-ESTER ASR benchmark test. This system achieved the lowest word error rate in this evaluation by a significant margin. We also report on a 1xRT version of this system.

doi: 10.21437/Interspeech.2005-544

Cite as: Gauvain, J.-L., Adda, G., Adda-Decker, M., Allauzen, A., Gendner, V., Lamel, L., Schwenk, H. (2005) Where are we in transcribing French broadcast news? Proc. Interspeech 2005, 1665-1668, doi: 10.21437/Interspeech.2005-544

  author={Jean-Luc Gauvain and G. Adda and Martine Adda-Decker and Alexandre Allauzen and V. Gendner and Lori Lamel and Holger Schwenk},
  title={{Where are we in transcribing French broadcast news?}},
  booktitle={Proc. Interspeech 2005},