ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Human translations guided language discovery for ASR systems

Sebastian Stüker, Laurent Besacier, Alex Waibel

The traditional approach of collecting and annotating the necessary training data is due to economic constraints not feasible for most of the 7,000 languages in the world. At the same time it is of vital interest to have natural language processing systems address practically all of them. Therefore, new, efficient ways of gathering the needed training material have to be found. In this paper we continue our experiments on exploiting the knowledge gained from human simultaneous translations that happen frequently in the real world, in order to discover word units in a new language. We evaluate our approach by measuring the performance of statistical machine translation systems trained on the word units discovered from an oracle phoneme sequence. We improve it then by combining it with a word discovery technique that works without supervision, solely on the unsegmented phoneme sequences.

doi: 10.21437/Interspeech.2009-765

Cite as: Stüker, S., Besacier, L., Waibel, A. (2009) Human translations guided language discovery for ASR systems. Proc. Interspeech 2009, 3023-3026, doi: 10.21437/Interspeech.2009-765

  author={Sebastian Stüker and Laurent Besacier and Alex Waibel},
  title={{Human translations guided language discovery for ASR systems}},
  booktitle={Proc. Interspeech 2009},