5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Word Sequence Pair Spotting for Synchronization of Speech and Text in Production of Closed-Caption TV Programs for the Hearing Impaired

Ichiro Maruyama (1), Yoshiharu Abe (2), Takahiro Wakao (3), Eiji Sawamura (3), Terumasa Ehara (4), Katsuhiko Shirai (5)

(1) Telecommunications Advancement Organization (TA0) of Japan, Japan
(2) Mitsubishi Electric Corporation / TAO, Japan
(3) TAO, Japan
(4) NHK Science and Technical Research Laboratories / TAO, Japan
(5) Waseda University / TAO, Japan

This paper describes a method of automatically synchronizing TV news speech and its captions. A news item consists of sentences and often has a corresponding computerized text, which can be used as a caption. We have developed a new phonetically HMM-based word spotter. In this word spotter, word sequences before and after a synchronization point are concatenated and scoring is based on the state of the synchronization point. The detection accuracy of the proposed method is shown to be superior to a conventional method using no word sequence pair. Model configurations are shown for detection failure, an announcer's misstatements and restatements, and erroneous transcriptions. A 100% detection rate with no false alarms is achieved by combining multiple word sequence pairs in series. A 100% detection rate with few false alarms is obtained by using model configurations for misstatements or erroneous transcriptions.

