9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Language Model Adaptation for a Speech to Sign Language Translation System Using Web Frequencies and a MAP Framework

Luis Fernando D'Haro (1), Ruben San-Segundo (1), Ricardo de Cordoba (1), Jan Bungeroth (2), Daniel Stein (2), Hermann Ney (2)

(1) Universidad Politècnica de Madrid, Spain
(2) RWTH Aachen University, Germany

This paper presents a successful technique for creating a new language model (LM) that adapts the original target LM used by a machine translation (MT) system. This technique is especially useful for situations where there are very scarce resources for training the target side (Spanish Sign Language (LSE) in our case) in order to properly estimate the target LM, the Sign Language Model (SLM), used by the MT system. The technique uses information from the source language, Spanish in our task, and from the phrase-based translation matrix in order to create a new LM, estimated using web frequencies, which adapts the counts of the SLM through the Maximum A Posteriori method (MAP). The corpus consists of common used sentences spoken by an officer when assisting people in applying for, or renewing, the National Identification Document. The proposed technique allows relative reductions of 15.5% on perplexity and 2.7% on WER for translation, which are close to half the maximum performance obtainable when only the LM is optimized.

Full Paper

Bibliographic reference.  D'Haro, Luis Fernando / San-Segundo, Ruben / Cordoba, Ricardo de / Bungeroth, Jan / Stein, Daniel / Ney, Hermann (2008): "Language model adaptation for a speech to sign language translation system using web frequencies and a MAP framework", In INTERSPEECH-2008, 2199-2202.