One successful approach to language recognition is to focus on the most discriminative high level features of languages, such as phones and words. In this paper, we applied a similar approach to acoustic features using a single GMM-tokenizer followed by discriminatively trained language models. A feature selection technique based on the Support Vector Machine (SVM) is used to model higher order n-grams. Three different ways to build this tokenizer are explored and compared using discriminative uni-gram and generative GMM-UBM. A discriminative uni-gram using very large GMM tokenizer with 24,576 components yields an EER of 1.66%, rising to 0.71% when fused with other acoustic approaches, on the NISTí03 LRE 30s evaluation.
Bibliographic reference. Hanani, Abualsoud / Carey, Michael / Russell, Martin J. (2010): "Improved language recognition using mixture components statistics", In INTERSPEECH-2010, 741-744.