INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Handling Recordings Acquired Simultaneously Over Multiple Channels with PLDA

Jesús Villalba (1), Mireia Diez (2), Amparo Varona (2), Eduardo Lleida (1)

(1) Universidad de Zaragoza, Spain
(2) Universidad del País Vasco, Spain

In some speaker recognition scenarios we find conversations recorded simultaneously over multiple channels. That is the case of the interviews in the NIST SRE dataset. To take advantage of that, we propose a modification of the PLDA model that considers two different inter-session variability terms. The first term is tied between all the recordings belonging to the same conversation whereas the second is not. Thus, the former mainly intends to capture the variability due to the phonetic content of the conversation while the latter tries to capture the channel variability. We test this approach on the NIST SRE12 core condition using multiple channels per interview to enroll the speakers. The proposed approach improves the minimum DCF by 26.29% on telephone speech and by 1.8% on interviews compared to the standard PLDA (scored by the book).

Full Paper

Bibliographic reference.  Villalba, Jesús / Diez, Mireia / Varona, Amparo / Lleida, Eduardo (2013): "Handling recordings acquired simultaneously over multiple channels with PLDA", In INTERSPEECH-2013, 2509-2513.