This paper presents a method of augmenting shifted-delta cepstral coefficients (SDCCs) with the classification outputs of an array of support vector machines (SVMs) trained to detect a set of manner and place features on telephone speech. The SVM array allows for broad phoneme classification, and when this information is concatenated with SDCCs to form a hybrid feature vector for each acoustic frame, a set of Gaussian mixture models (GMMs) may be trained to perform automatic language identification (LID). The NTIMIT telephone band speech corpus was used to train the SVM-based distinctive feature recognizers, while the NIST callfriend telephone corpus was used for training and testing the rest of the system.
Index Terms— Support Vector Machines, Gaussian Mixture Models, Distinctive Features, Language Identification
Cite as: Harwath, D., Hasegawa-Johnson, M. (2010) Phonetic landmark detection for automatic language identification. Proc. Speech Prosody 2010, paper 231
@inproceedings{harwath10_speechprosody, author={David Harwath and Mark Hasegawa-Johnson}, title={{Phonetic landmark detection for automatic language identification}}, year=2010, booktitle={Proc. Speech Prosody 2010}, pages={paper 231} }