This paper is dedicated to the use of auxiliary information in order to help a classical acoustic-based speaker identification system in the specific context of TV shows. The underlying assumption is that auxiliary information could help (1) to re-rank n-best speaker hypotheses provided by the acoustic-based only speaker identification system, (2) to provide confidence score to refine a rejection process (open-set identification task), and finally, (3) to identify speakers not covered by the speaker dictionary (out-of-dictionary speakers) used by the speaker identification system (full-set verification task); the last point being one of the main issue when dealing with TV shows. In this paper, the auxiliary information is based on person names detected in overlaid text and speech. Experiments conducted in three different datasets issued from the REPERE evaluation campaign have highlighted the interest of the auxiliary information used here, and notably the use of overlaid person names to identify out-of-dictionary speakers, confirming the key assumptions made.
Bibliographic reference. Charlet, Delphine / Fredouille, Corinne / Damnati, Géraldine / Senay, Grégory (2013): "Improving speaker identification in TV-shows using person name detection in overlaid text and speech", In INTERSPEECH-2013, 2778-2782.