14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Unsupervised Naming of Speakers in Broadcast TV: Using Written Names, Pronounced Names or Both?

Johann Poignant (1), Laurent Besacier (1), Viet Bac Le (2), Sophie Rosset (3), Georges Quénot (1)

(1) LIG (UMR 5217), France
(2) Vocapia Research, France
(3) LIMSI, France

Persons identification in video from TV broadcast is a valuable tool for indexing them. However, the use of biometric models is not a very sustainable option without a priori knowledge of people present in the videos. The pronounced names (PN) or written names (WN) on the screen can provide hypotheses names for speakers. We propose an experimental comparison of the potential of these two modalities (names pronounced or written) to extract the true names of the speakers. The names pronounced offer many instances of citation but transcription and named-entity detection errors halved the potential of this modality. On the contrary, the written names detection benefits of the video quality improvement and is nowadays rather robust and efficient to name speakers. Oracle experiments presented for the mapping between written names and speakers also show the complementarity of both PN and WN modalities.

Full Paper

Bibliographic reference.  Poignant, Johann / Besacier, Laurent / Le, Viet Bac / Rosset, Sophie / Quénot, Georges (2013): "Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both?", In INTERSPEECH-2013, 1462-1466.