In this paper, we present our initial efforts in the task of Automatically Synchronizing live spoken Utterances with their Transcripts (textual contents) (ASUT). We address the problem of online detecting of the end time of a spoken utterance given its textual content, which is one of the key problems of the ASUT task. A framesynchronous likelihood ratio test (FS-LRT) procedure is proposed and explored under the hidden Markov model (HMM) framework. The property of FS-LRT is studies empirically. Experiments indicate that our proposed approach shows satisfying performance. In addition, the proposed procedure has been successfully applied in a subtitling system for live broadcast news.
Bibliographic reference. Gao, Jie / Zhao, Qingwei / Yan, Yonghong (2009): "Online detecting end times of spoken utterances for synchronization of live speech and its transcripts", In INTERSPEECH-2009, 2115-2118.