14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Combining Acoustic Name Spotting and Continuous Context Models to improve Spoken Person Name Recognition in Speech

Benjamin Bigot, Grégory Senay, Georges Linarès, Corinne Fredouille, Richard Dufour

LIA, France

Retrieving pronounced person names in spoken documents is a critical problematic in the context of audiovisual content indexing. In this paper, we present a cascading strategy for two methods dedicated to spoken name recognition in speech. The first method is an acoustic name spotting in phoneme confusion networks. It is based on a phonetic edition distance criterion based on phoneme probabilities held in confusion networks. The second method is a continuous context modelling approach applied on the 1-best transcription output. It relies on a probabilistic modelling of name-to-context dependencies. We assume that the combination of these methods, based on different types of information, may improve spoken name recognition performance. This assumption is studied through experiments done on a set of audiovisual documents from the development set of the REPERE challenge. Results report that combining acoustic and linguistic methods produces an absolute gain of 3% in terms of F-measure compared to the best system taken alone.

Full Paper

Bibliographic reference.  Bigot, Benjamin / Senay, Grégory / Linarès, Georges / Fredouille, Corinne / Dufour, Richard (2013): "Combining acoustic name spotting and continuous context models to improve spoken person name recognition in speech", In INTERSPEECH-2013, 2539-2543.