EUROSPEECH 2003 - INTERSPEECH 2003
Formal evaluations conducted by NIST in 1996 demonstrated that systems that used parallel banks of tokenizer-dependent language models produced the best language identification performance. Since that time, other approaches to language identification have been developed that match or surpass the performance of phone-based systems. This paper describes and evaluates three techniques that have been applied to the language identification problem: phone recognition, Gaussian mixture modeling, and support vector machine classification. A recognizer that fuses the scores of three systems that employ these techniques produces a 2.7% equal error rate (EER) on the 1996 NIST evaluation set and a 2.8% EER on the NIST 2003 primary condition evaluation set. An approach to dealing with the problem of out-of-set data is also discussed.
Bibliographic reference. Singer, E. / Torres-Carrasquillo, P.A. / Gleason, T.P. / Campbell, W.M. / Reynolds, Douglas A. (2003): "Acoustic, phonetic, and discriminative approaches to automatic language identification", In EUROSPEECH-2003, 1345-1348.