Phonotactic method for spoken language recognition (SLR) deals with permissible phone patterns and their frequencies of occurrence in a specific language. Phone recognizers followed by vector space models (PR-VSM) system is a state-of-the-art phonotactic language identification system, in which any utterance can be mapped into a supervector filled with likelihood scores of the n-gram tokens (bag-of-n-gram). However, the bag-of-n-gram language model is not good at capture the long-context co-occurrence relations due to the restriction match of the n-gram phonemes and vulnerable to the insert and delete errors induced by the frontend phone recognizer. We propose a novel approach to fill the gaps based on the use of time-gap-weighted lattice kernel (TGWLK) in this paper. The kernel is an inner product in the feature space generated by all contiguous and uncontiguous subsequences in variety length in the lattice, which are weighted by an exponentially decaying factor produced by their time gap length. The results of experiments on the NIST 2009 LRE corpus demonstrate that the proposed TGWLK shows a reduction in equal error rate (EER) than baseline system.
Bibliographic reference. Liu, Wei-Wei / Zhang, Wei-Qiang / Liu, Jia (2014): "Phonotactic language recognition based on time-gap-weighted lattice kernels", In INTERSPEECH-2014, 3022-3026.