The problem of automatic excitement detection in baseball videos is considered and applied to highlights generation. This paper focuses on detecting exciting events in the video using complementary information from the audio and video domains. First, a new measure for non-stationarity which is extremely effective in separating background from speech is proposed. This new feature is employed in a unsupervised GMM-based segmentation algorithm that identifies the commentators speech in the crowd background. Thereafter, the ``level-of-excitement'' is measured using features such as pitch, F1-F3 center frequencies, and spectral center of gravity extracted from the commentators speech. Our experiments show that these features are well correlated with human assessment of excitability. Furthermore, slow-motion replay and pitching-scenes from the video are also detected to estimate scene end-points. Finally, audio/video information is fused to rank-order scenes by ``excitability'' and generate highlights of user-defined time-lengths. The techniques described in this paper are generic and applicable to a variety of domains.
Bibliographic reference. Bořil, Hynek / Sangwan, Abhijeet / Hasan, Taufiq / Hansen, John H. L. (2010): "Automatic excitement-level detection for sports highlights generation", In INTERSPEECH-2010, 2202-2205.