12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Speaker Role Recognition Using Question Detection and Characterization

Thierry Bazillon (1), Benjamin Maza (2), Michael Rouvier (2), Frederic Bechet (1), Alexis Nasr (1)

(1) LIF, France
(2) LIA, France

Speech Data Mining is an area of research dedicated to characterizing audio streams that contain speech from one or more speakers, using descriptors related to the form and the content of the speech signal. Besides the word transcription, information about the type of audio stream and the role and identity of speakers is also crucial to allow complex queries such as: "seek debates on X", "find all the interviews of Y", etc. In this framework we present a study performed on broadcast conversations that focuses on the way speakers express their questions in conversations. The initial intuition is that the type of questions asked can help identify the role (anchor, guest, expert, etc.) of a speaker in a conversation. By tagging these questions with a set of labels and using this information in addition to the commonly used descriptors to classify users' role in broadcast conversations, we improve the role classification accuracy and validate our initial intuition.

Full Paper

Bibliographic reference.  Bazillon, Thierry / Maza, Benjamin / Rouvier, Michael / Bechet, Frederic / Nasr, Alexis (2011): "Speaker role recognition using question detection and characterization", In INTERSPEECH-2011, 1333-1336.