5th International Conference on Spoken Language Processing
This study analyses the possible use of automatic speech recognition (ASR) for the automatic captioning of TV programs. Captioning requires: (1) transcribing the spoken words and (2) determining the times at which the caption has to appear and disappear on the screen. These times have to match as closely as possible the corresponding times on the audio signal. Automatic speech recognition can be used to determine both aspects: the spoken words and their times. This paper focuses on the question: would perfect automatic speech recognition systems be able to automate the captioning process? We present quantitative data on the discrepancy between the audio signal and the manually generated captions. We show how ASR alone can even lower the efficiency of captioning. The techniques needed to automate the captioning process are presented.
Bibliographic reference. Robert-Ribes, Jordi (1998): "On the use of automatic speech recognition for TV captioning", In ICSLP-1998, paper 0621.