5th International Conference on Spoken Language Processing
This paper describes a method of automatically synchronizing TV news speech and its captions. A news item consists of sentences and often has a corresponding computerized text, which can be used as a caption. We have developed a new phonetically HMM-based word spotter. In this word spotter, word sequences before and after a synchronization point are concatenated and scoring is based on the state of the synchronization point. The detection accuracy of the proposed method is shown to be superior to a conventional method using no word sequence pair. Model configurations are shown for detection failure, an announcer's misstatements and restatements, and erroneous transcriptions. A 100% detection rate with no false alarms is achieved by combining multiple word sequence pairs in series. A 100% detection rate with few false alarms is obtained by using model configurations for misstatements or erroneous transcriptions.
Bibliographic reference. Maruyama, Ichiro / Abe, Yoshiharu / Wakao, Takahiro / Sawamura, Eiji / Ehara, Terumasa / Shirai, Katsuhiko (1998): "Word sequence pair spotting for synchronization of speech and text in production of closed-caption TV programs for the hearing impaired", In ICSLP-1998, paper 1113.