11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Improved Language Recognition Using Mixture Components Statistics

Abualsoud Hanani, Michael Carey, Martin J. Russell

University of Birmingham, UK

One successful approach to language recognition is to focus on the most discriminative high level features of languages, such as phones and words. In this paper, we applied a similar approach to acoustic features using a single GMM-tokenizer followed by discriminatively trained language models. A feature selection technique based on the Support Vector Machine (SVM) is used to model higher order n-grams. Three different ways to build this tokenizer are explored and compared using discriminative uni-gram and generative GMM-UBM. A discriminative uni-gram using very large GMM tokenizer with 24,576 components yields an EER of 1.66%, rising to 0.71% when fused with other acoustic approaches, on the NISTí03 LRE 30s evaluation.

Full Paper

Bibliographic reference.  Hanani, Abualsoud / Carey, Michael / Russell, Martin J. (2010): "Improved language recognition using mixture components statistics", In INTERSPEECH-2010, 741-744.