This paper presents a novel subspace-based approach for phonotactic language recognition. The whole framework is divided into two parts: the speech feature representation and the subspace-based learning algorithm. First, the phonetic information as well as the contextual relationship, possessed by spoken utterances, are more abundantly retrieved by likelihood computation and feature concatenation through the decoding processed by an automatic speech recognizer. It is assumed that the extracted phone frames reside in a lower dimensional eigen-subspace, in which the structure of data can be approximately captured. Each utterance is further represented by a fixed-dimensional linear subspace. Second, to measure the similarity between two utterances, suitable non-Euclidean metrics are explored and applied to non-linear discriminant analysis in a kernel fashion, followed by a back-end classifier, such as the k-nearest neighbor (K-NN) classifier. The results of experiments on the OGI-TS database demonstrate that the proposed framework outperforms the well-known vector space modeling based method with relative reductions of 38.90% and 27.13% on the 1-to-50-second and 3-second data sets respectively in equal error rate (EER).
Index Terms: language recognition, subspace-based learning
Bibliographic reference. Shih, Yu-Chin / Lee, Hung-Shin / Wang, Hsin-Min / Jeng, Shyh-Kang (2012): "Subspace-based feature representation and learning for language recognition", In INTERSPEECH-2012, 2061-2064.