2nd Workshop on Spoken Language Technologies for Under-Resourced Languages

Universiti Sains, Penang, Malaysia
May 3-5, 2010

Towards Building Effective Language Translation Systems

Ruhi Sarikaya

IBM Watson Labs, NY, USA

Automatic Language Translation - widely known as Machine Translation (MT) - has been one of the long-standing elusive goals in natural language processing and artificial intelligence. With the effect of increasing globalization at the individual and enterprise level, and wide-spread use of social networking sites the necessity to exchange knowledge between people who do not share a common language put MT into the spotlight. Now, having access to vast amounts of translation data and powerful computers, we are closer than ever to achieving that goal. In this talk we focus on building usable machine translation systems. We will highlight the practical and fundamental challenges for building MT systems and present our solutions and approaches on both fronts. In particular, we first give an overview of MT research, then focus on parallel data construction for MT, language and MT modeling in continuous space. We also demonstrate working MT systems for various applications between English and several major languages.

