15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Automated Production of True-Cased Punctuated Subtitles for Weather and News Broadcasts

Joris Driesen (1), Alexandra Birch (1), Simon Grimsey (2), Saeid Safarfashandi (2), Juliet Gauthier (2), Matt Simpson (2), Steve Renals (1)

(1) University of Edinburgh, UK
(2) Red Bee Media, UK

Providing subtitling for multimedia content is a highly costly process. Any system aimed at automating at least part of this process may therefore yield significant economic benefits for content providers. In this paper, we present an integrated automatic system capable of automatically subtitling weather forecasts and news broadcasts. In this system, a number of different modules are stringed together, each performing a single processing step in the pipeline. An ASR (Automatic Speech Recognition) module first converts raw audio into an uninterrupted stream of written words. A decision tree classifier then marks sentence boundaries in the resulting word sequence. Finally, a SMT (Statistical Machine Translation) module `translates' the resulting sentences into punctuated true-cased text. The system has been developed in close cooperation with Red Bee Media and will be deployed in their commercial production pipeline.

Full Paper

Bibliographic reference.  Driesen, Joris / Birch, Alexandra / Grimsey, Simon / Safarfashandi, Saeid / Gauthier, Juliet / Simpson, Matt / Renals, Steve (2014): "Automated production of true-cased punctuated subtitles for weather and news broadcasts", In INTERSPEECH-2014, 2146-2147.