This paper presents a system that uses acoustic stress detection to identify important concepts in educational videos. The proposed system is part of a non-linear navigation system that contains additional features like dynamic word cloud and 2-D timeline. An important feature of the word cloud is that the color used to represent the word depicts its spoken emphasis. This emphasis is estimated by quantifying the acoustic stress of each word. Stressed instances of a given word are also highlighted on the 2-D timeline using different colors. The primary focus of this paper is to detect words spoken with higher acoustic stress and provide an efficient means to navigate to corresponding instances. In the training phase, words are labeled manually as `stressed' or `unstressed' by speech experts. An SVM classifier is trained using three types of acoustic features: intensity-based, pitch-based and duration-based. Considering the data imbalance in terms of the ratio of `stressed' to `unstressed' words, the performance achieved (70% correct detection at a false-alarm rate of 19%) is satisfactory. The usability studies show that the time taken to detect and navigate to stressed instances of words is significantly less (p < 0.01) than that using a youtube-type baseline system.
Bibliographic reference. Patil, Sonal / Arsikere, Harish / Deshmukh, Om (2015): "Acoustic stress detection for improved navigation of educational videos", In INTERSPEECH-2015, 1882-1883.