Phonotactic language recognition is one of major techniques used for automatic recognition of spoken languages. We propose a feature extraction technique based on PCA to be used with SVM-based systems. This technique improves speed of the training, in some cases more than 1000 times, allowing systems to be effectively trained on much larger data sets. Speed-up of the test phase can be even greater, which makes the resulting systems much more useful for processing large amounts of data. We report our results on NIST LRE 2009 task.
Cite as: Mikolov, T., Plchot, O., Glembek, O., Burget, L., Cernocký, J. (2010) PCA-based Feature Extraction for Phonotactic Language Recognition. Proc. The Speaker and Language Recognition Workshop (Odyssey 2010), paper 42
@inproceedings{mikolov10_odyssey, author={Tomás Mikolov and Oldrich Plchot and Ondrej Glembek and Lukás Burget and Jan Cernocký}, title={{PCA-based Feature Extraction for Phonotactic Language Recognition}}, year=2010, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2010)}, pages={paper 42} }