This paper reports the first known effort to automatically align the spoken utterances in recorded lectures with the content of the slides used. Such technologies will be very useful in Massive Open On-line Courses (MOOCs) and various recorded lectures as well as many other applications. We propose a set of approaches considering the problem that words helpful for such alignment are sparse and noisy, and the assumption that the presentation of a slide is usually smooth and top-down across the slide. This includes utterance clustering, entropy-based word filtering, reliability-propagated word-based matching, and the structured support vector machine (SVM) learning from local and global features. Initial experimental results with the lectures in a course offered in National Taiwan University showed very encouraging results as compared to the baseline approaches.
Bibliographic reference. Lu, Han / Shen, Sheng-syun / Shiang, Sz-Rung / Lee, Hung-yi / Lee, Lin-shan (2014): "Alignment of spoken utterances with slide content for easier learning with recorded lectures using structured support vector machine (SVM)", In INTERSPEECH-2014, 1473-1477.