9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Class-Based Statistical Machine Translation for Field Maintainable Speech-to-Speech Translation

Ian R. Lane, Alex Waibel

Mobile Technologies LLC, USA

Current speech-to-speech translation systems lack any mechanism to handle out-of-vocabulary words that did not appear in the training data. To improve the usability of these systems we have developed a field maintainable speech-to-speech translation framework that enables users to add new words to the system while it is being used in the field. To realize such a framework, a novel class-based statistical machine translation framework is proposed, that applies class-based translation models and class n-gram language models during translation. To obtain consistent labelling of the parallel training corpora, on which these models are trained, we introduce a bilingual tagger that jointly labels both sides of the parallel corpora. On a Japanese-English evaluation system, the proposed framework significantly improved translation quality, obtaining a relative improvement in BLEU-score of 15% for both translation directions.

Full Paper

Bibliographic reference.  Lane, Ian R. / Waibel, Alex (2008): "Class-based statistical machine translation for field maintainable speech-to-speech translation", In INTERSPEECH-2008, 2362-2365.