Retrieving pronounced person names in spoken documents is a critical problematic in the context of audiovisual content indexing. In this paper, we present a cascading strategy for two methods dedicated to spoken name recognition in speech. The first method is an acoustic name spotting in phoneme confusion networks. It is based on a phonetic edition distance criterion based on phoneme probabilities held in confusion networks. The second method is a continuous context modelling approach applied on the 1-best transcription output. It relies on a probabilistic modelling of name-to-context dependencies. We assume that the combination of these methods, based on different types of information, may improve spoken name recognition performance. This assumption is studied through experiments done on a set of audiovisual documents from the development set of the REPERE challenge. Results report that combining acoustic and linguistic methods produces an absolute gain of 3% in terms of F-measure compared to the best system taken alone.
Bibliographic reference. Bigot, Benjamin / Senay, Grégory / Linarès, Georges / Fredouille, Corinne / Dufour, Richard (2013): "Combining acoustic name spotting and continuous context models to improve spoken person name recognition in speech", In INTERSPEECH-2013, 2539-2543.