In some speaker recognition scenarios we find conversations recorded simultaneously over multiple channels. That is the case of the interviews in the NIST SRE dataset. To take advantage of that, we propose a modification of the PLDA model that considers two different inter-session variability terms. The first term is tied between all the recordings belonging to the same conversation whereas the second is not. Thus, the former mainly intends to capture the variability due to the phonetic content of the conversation while the latter tries to capture the channel variability. We test this approach on the NIST SRE12 core condition using multiple channels per interview to enroll the speakers. The proposed approach improves the minimum DCF by 26.29% on telephone speech and by 1.8% on interviews compared to the standard PLDA (scored by the book).
Bibliographic reference. Villalba, Jesús / Diez, Mireia / Varona, Amparo / Lleida, Eduardo (2013): "Handling recordings acquired simultaneously over multiple channels with PLDA", In INTERSPEECH-2013, 2509-2513.