Unsupervised paradigms for domain-independent video structure analysis
Video structure analysis consists in dividing a video into elementary structural units such as anchor shots or interviews. Most approaches to the problem of structure analysis follow a supervised train/detect paradigm. For example, machine learning techniques have widely been used for the detection of anchor shots, specific actions, etc. Such paradigms have proven highly efficient on specific contents but lack domain and genre independence. To overcome the limitation of current techniques, we will investigate unsupervised paradigms for robust video structure analysis.
In recent years, we have been working on discovery algorithms to find out in a totally unsupervised fashion coherent or repeating elements in audio and video streams. In a very general way, the problem of unsupervised discovery can be seen as a particular case of a clustering problem. For instance, in audio contents, we have proposed variability tolerant pattern matching techniques to discover repeating chunks of signals corresponding to word-like units . In video contents, we have used audiovisual consistency between audio and visual clusters to discover structural elements such as anchor persons or guest's shots in games and talkshows.
In parallel, we have been working on topic segmentation of TV programs based on their automatic transcription, developing domain-independent methods robust to transcription errors, where no prior knowledge on topics is required . In particular, robustness can be obtained relying on sources of information other than the transcribed speech material, such as audio events (pauses, speaker changes, etc.) or visual events (shot changes, anchor shots, etc.).
The goal of this post-doctoral position is to experiment further unsupervised discovery paradigms for robust structure analysis. The post-doctoral researcher will lead research in the following topics:
1. Unsupervised discovery paradigms in audio and video contents: (a) Improve current algorithms, both in performance and in computational burden; For example, one can rely on automatically built discriminative models from the result of an initial discovery step to improve performance. (b) Propose innovative solutions to define amapping of discovered elements to semantically meaningful events.
2. Apply discovery paradigms for video segmentation, and, in particular, for topic segmentation (accouting for structural elements, transcript-free segmentation, etc.).
The work will be carried out jointly in the Multimedia group and in the Speech and Audio Processing group at INRIA Rennes, France, in the framework of the OSEO-funded project QUAERO. The position is to be filled as soon as possible and for a duration of 1 year, renewable once. Prospective candidates should have a strong background in at least one of the following domains: pattern recognition preferably applied to speech or video processing, machine learning, multimedia, data mining. Salary depending on experience.
Guillaume Gravier (firstname.lastname@example.org)
Mathieu Ben (email@example.com)
For applications, please send a resume, a short summary of previous work and contacts for recommendation.
INRIA Rennes: http://www.inria.fr/rennes
Multimedia Group Texmex, http://www.irisa.fr/texmex
Speech and Audio Processing Group Metiss, http://www.irisa.fr/metiss
Quaero project: http://www.quaero.org
 Armando Muscariello, Guillaume Gravier and Frédéric Bimbot. Audio keyword extraction by unsupervised word discovery. In Proc. Conf. of the Intl. Speech Communication Association (Interspeech), 2009.
 Camille Guinaudeau, Guillaume Gravier and Pascale Sébillot. Improving ASR-based topic segmentation of TV programs with confidence measures and semantic relations. Submitted to Intl. Speech Communication Association (Interspeech), 2010.