This paper studies feature selection in phonotactic language recognition. The phonotactic feature is presented by n-gram statistics derived from one or more phone recognizers in the form of high dimensional feature vectors. Two feature selection strategies are proposed to select the n-gram statistics for reducing the dimension of feature vectors, so that higher order n-gram features can be adopted in language recognition. With the proposed feature selection techniques, we achieved equal error rates (EERs) of 1.84% with 4-gram statistics on the 2007 NIST Language Recognition Evaluation 30s closed test sets.
Bibliographic reference. Tong, Rong / Ma, Bin / Li, Haizhou / Chng, Eng Siong (2010): "Selecting phonotactic features for language recognition", In INTERSPEECH-2010, 737-740.