10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Online Detecting End Times of Spoken Utterances for Synchronization of Live Speech and its Transcripts

Jie Gao, Qingwei Zhao, Yonghong Yan

Chinese Academy of Sciences, China

In this paper, we present our initial efforts in the task of Automatically Synchronizing live spoken Utterances with their Transcripts (textual contents) (ASUT). We address the problem of online detecting of the end time of a spoken utterance given its textual content, which is one of the key problems of the ASUT task. A framesynchronous likelihood ratio test (FS-LRT) procedure is proposed and explored under the hidden Markov model (HMM) framework. The property of FS-LRT is studies empirically. Experiments indicate that our proposed approach shows satisfying performance. In addition, the proposed procedure has been successfully applied in a subtitling system for live broadcast news.

Full Paper

Bibliographic reference.  Gao, Jie / Zhao, Qingwei / Yan, Yonghong (2009): "Online detecting end times of spoken utterances for synchronization of live speech and its transcripts", In INTERSPEECH-2009, 2115-2118.