This paper describes a method of automatically synchronizing TV news speech and its captions. A news item consists of sentences and often has a corresponding computerized text, which can be used as a caption. We have developed a new phonetically HMM-based word spotter. In this word spotter, word sequences before and after a synchronization point are concatenated and scoring is based on the state of the synchronization point. The detection accuracy of the proposed method is shown to be superior to a conventional method using no word sequence pair. Model configurations are shown for detection failure, an announcer's misstatements and restatements, and erroneous transcriptions. A 100% detection rate with no false alarms is achieved by combining multiple word sequence pairs in series. A 100% detection rate with few false alarms is obtained by using model configurations for misstatements or erroneous transcriptions.
Cite as: Maruyama, I., Abe, Y., Wakao, T., Sawamura, E., Ehara, T., Shirai, K. (1998) Word sequence pair spotting for synchronization of speech and text in production of closed-caption TV programs for the hearing impaired. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 1113, doi: 10.21437/ICSLP.1998-822
@inproceedings{maruyama98_icslp, author={Ichiro Maruyama and Yoshiharu Abe and Takahiro Wakao and Eiji Sawamura and Terumasa Ehara and Katsuhiko Shirai}, title={{Word sequence pair spotting for synchronization of speech and text in production of closed-caption TV programs for the hearing impaired}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 1113}, doi={10.21437/ICSLP.1998-822} }