Odyssey 2008: The Speaker and Language Recognition Workshop
Stellenbosch, South Africa
In this paper we present language detectors built using relatively small amounts of training data. This is carried out using the modelling power of a Linear Discriminant Analysis back-end for the languages which have a small amount of training data. We present experiments on NIST 2005 Language Recognition Evaluation data, where we use a jackknifing technique to remove welltrained language knowledge from the LDA back-end, using only sparse trials for training the LDA. We investigate three systems, which show different levels of loss of language detection capability. We validate the technique on an independent collection of 21 languages, where we show that with less than one hour training we obtain an error rate for ‘new’ languages that is only slightly over twice the error rate for languages for which the full 60 hours of CallFriend data is available.
Full Paper Presentation (PDF)
Bibliographic reference. Leeuwen, David A. van / Brümmer, Niko (2008): "Building language detectors using small amounts of training data", In Odyssey-2008, paper 015.