SLTU-2008 - First International Workshop on Spoken Languages Technologies for Under-Resourced Languages

Hanoi, Vietnam
May 5-7, 2008

Towards Human Translations Guided Language Discovery for ASR Systems

Sebastian Stüker, Alex Waibel

Institut für Theoretische Informatik, Universität Karlsruhe (TH), Karlsruhe, Germany

Natural language processing systems, e.g for Automatic Speech Recognition (ASR) or Machine Translation (MT), have been studied only for a fraction of the approx. 7000 languages that exist in today’s world, the majority of which have only comparatively few speakers and few resources. The traditional approach of collecting and annotating the necessary training data is due to economic constraints not feasible for most of them. At the same time it is of vital interest to have NLP systems address practically all languages in the world. New, efficient ways of gathering the needed training material have to be found. In this paper we propose a new technique of collecting such data by exploiting the knowledge gained from Human simultaneous translations that happen frequently in the real world. To show the feasibility of our approach we present first experiments towards constructing a pronunciation dictionary from the data gained.

Index Terms— Automatic Speech Recognition, Language Discovery, Machine Translation, Under-Resourced Languages

Full Paper
Presentation (pdf)

Bibliographic reference.  Stüker, Sebastian / Waibel, Alex (2008): "Towards human translations guided language discovery for ASR systems", In SLTU-2008, 76-79.