7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper describes a "re-speak" method for subtitling live TV broadcasts using a speech recognition system. Original on-location speech in live sport or music programs contains background noise, spontaneous or emotional speech, and the voices of speakers unknown to the recognition system, all of which cause recognition performance to deteriorate. However, if a different individual, to which the system has been adapted, carefully rephrases the original utterances in a studio, these problems can be largely overcome. Recognition experiments showed that rephrasing the commentary was effective in reducing perplexities and word error rates compared with simply repeating it. Speech recognition using the re-speak method was applied in practice to a music-based variety show and the 2002 Winter Olympic Games in order automatically to produce simultaneous subtitles for hearing-impaired viewers. A word error rate below 5% and a subtitle display delay time below three seconds were achieved.
Bibliographic reference. Imai, Toru / Matsui, Atsushi / Homma, Shinichi / Kobayakawa, Takeshi / Onoe, Kazuo / Sato, Shoei / Ando, Akio (2002): "Speech recognition with a re-speak method for subtitling live broadcasts", In ICSLP-2002, 1757-1760.