INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Dimensionality Reduction for Using High-Order n-Grams in SVM-Based Phonotactic Language Recognition

Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-Fuentes, Germán Bordel

Universidad del País Vasco, Spain

SVM-based phonotactic language recognition is state-of-the-art technology. However, due to computational bounds, phonotactic information is usually limited to low-order phone n-grams (up to n = 3). In a previous work, we proposed a feature selection algorithm, based on n-gram frequencies, which allowed us work successfully with high-order n-grams on the NIST 2007 LRE database. In this work, we use two feature projection methods for dimensionality reduction of feature spaces including up to 4-grams: Principal Component Analysis (PCA) and Random Projection. These methods allow us to attain competitive performance even for small feature sets (e.g. of size 500). Systems were built by means of open software (BUT phone decoders, HTK, SRILM, LIBLINEAR and FoCal) and experiments were carried out on the NIST 2009 LRE database. Best performance was attained by using the feature selection algorithm to get around 11500 features: 1.93% EER and CLLR = 0.413. When considering smaller sets of features, PCA provided best performance. For instance, using PCA to get a 500-dimensional feature subspace yielded 2.15% EER and CLLR = 0.457 (25% improvement with regard to using feature selection).

Full Paper

Bibliographic reference.  Penagarikano, Mikel / Varona, Amparo / Rodriguez-Fuentes, Luis Javier / Bordel, Germán (2011): "Dimensionality reduction for using high-order n-grams in SVM-based phonotactic language recognition", In INTERSPEECH-2011, 853-856.