ISCA Archive Odyssey 2016
ISCA Archive Odyssey 2016

Out-of-Set i-Vector Selection for Open-set Language Identification

Hamid Behravan, Tomi Kinnunen, Ville Hautamäki

Current language identification (LID) systems are based on an i-vector classifier followed by a multi-class recognition back-end. Identification accuracy degrades considerably when LID systems face open-set data. In this study, we propose an approach to the problem of out of set (OOS) data detection in the context of open-set language identification. In our approach, each unlabeled i-vector in the development set is given a per-class outlier score computed with the help of non-parametric Kolmogorov-Smirnov (KS) test. Detected OOS data from unlabeled development set is then used to train an additional model to represent OOS languages in the back-end. The proposed approach achieves a relative decrease of 16% in equal error rate (EER) over classical OOS detection methods, in discriminating in-set and OOS languages. Using support vector machine (SVM) as language back-end classifier, integrating the proposed method to the LID back-end yields 15% relative decrease in identification cost in comparison to using all the development set as OOS candidates.

doi: 10.21437/Odyssey.2016-44

Cite as: Behravan, H., Kinnunen, T., Hautamäki, V. (2016) Out-of-Set i-Vector Selection for Open-set Language Identification. Proc. The Speaker and Language Recognition Workshop (Odyssey 2016), 303-310, doi: 10.21437/Odyssey.2016-44

  author={Hamid Behravan and Tomi Kinnunen and Ville Hautamäki},
  title={{Out-of-Set i-Vector Selection for Open-set Language Identification}},
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2016)},