ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Subtitle Translation as Markup Translation

Colin Cherry, Naveen Arivazhagan, Dirk Padfield, Maxim Krikun

Automatic subtitle translation is an important technology to make video content available across language barriers. Subtitle translation complicates the normal translation problem by adding the challenge of how to format the system output into subtitles. We propose a simple technique that treats subtitle translation as standard sentence translation plus alignment driven markup transfer, which enables us to reliably maintain timing and formatting information from the source subtitles. We also introduce two metrics to measure the quality of subtitle boundaries: a Timed BLEU that penalizes mistimed tokens with respect to a reference subtitle sequence, and a measure of how much Timed BLEU is lost due to suboptimal subtitle boundary placement. In experiments on TED and YouTube subtitles, we show that we are able to achieve much better translation quality than a baseline that translates each subtitle independently, while coming very close to optimal subtitle boundary placement.

doi: 10.21437/Interspeech.2021-744

Cite as: Cherry, C., Arivazhagan, N., Padfield, D., Krikun, M. (2021) Subtitle Translation as Markup Translation. Proc. Interspeech 2021, 2237-2241, doi: 10.21437/Interspeech.2021-744

  author={Colin Cherry and Naveen Arivazhagan and Dirk Padfield and Maxim Krikun},
  title={{Subtitle Translation as Markup Translation}},
  booktitle={Proc. Interspeech 2021},