ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Class-based statistical machine translation for field maintainable speech-to-speech translation

Ian R. Lane, Alex Waibel

Current speech-to-speech translation systems lack any mechanism to handle out-of-vocabulary words that did not appear in the training data. To improve the usability of these systems we have developed a field maintainable speech-to-speech translation framework that enables users to add new words to the system while it is being used in the field. To realize such a framework, a novel class-based statistical machine translation framework is proposed, that applies class-based translation models and class n-gram language models during translation. To obtain consistent labelling of the parallel training corpora, on which these models are trained, we introduce a bilingual tagger that jointly labels both sides of the parallel corpora. On a Japanese-English evaluation system, the proposed framework significantly improved translation quality, obtaining a relative improvement in BLEU-score of 15% for both translation directions.


doi: 10.21437/Interspeech.2008-602

Cite as: Lane, I.R., Waibel, A. (2008) Class-based statistical machine translation for field maintainable speech-to-speech translation. Proc. Interspeech 2008, 2362-2365, doi: 10.21437/Interspeech.2008-602

@inproceedings{lane08_interspeech,
  author={Ian R. Lane and Alex Waibel},
  title={{Class-based statistical machine translation for field maintainable speech-to-speech translation}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2362--2365},
  doi={10.21437/Interspeech.2008-602}
}