ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Improving detection of acoustic events using audiovisual data and feature level fusion

T. Butko, C. Canton-Ferrer, C. Segura, X. Giró, C. Nadeu, J. Hernando, J. R. Casas

The detection of the acoustic events (AEs) that are naturally produced in a meeting room may help to describe the human and social activity that takes place in it. When applied to spontaneous recordings, the detection of AEs from only audio information shows a large amount of errors, which are mostly due to temporal overlapping of sounds. In this paper, a system to detect and recognize AEs using both audio and video information is presented. A feature-level fusion strategy is used, and the structure of the HMM-GMM based system considers each class separately and uses a one-against-all strategy for training. Experimental AED results with a new and rather spontaneous dataset are presented which show the advantage of the proposed approach.


doi: 10.21437/Interspeech.2009-334

Cite as: Butko, T., Canton-Ferrer, C., Segura, C., Giró, X., Nadeu, C., Hernando, J., Casas, J.R. (2009) Improving detection of acoustic events using audiovisual data and feature level fusion. Proc. Interspeech 2009, 1147-1150, doi: 10.21437/Interspeech.2009-334

@inproceedings{butko09_interspeech,
  author={T. Butko and C. Canton-Ferrer and C. Segura and X. Giró and C. Nadeu and J. Hernando and J. R. Casas},
  title={{Improving detection of acoustic events using audiovisual data and feature level fusion}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1147--1150},
  doi={10.21437/Interspeech.2009-334}
}