Odyssey 2010: The Speaker and Language Recognition Workshop
Brno, Czech Republic
Most common approaches to phonotactic language recognition deal with several independent phone decodings. These decodings are processed and scored in a fully uncoupled way, their time alignment (and the information that may be extracted from it) being completely lost. Recently, a new approach to phonotactic language recognition has been presented (Penagarikano, ICASSP2010), which takes into account time alignment information, by considering cross-decoder phone co-occurrences at the frame level, under two language modeling paradigms: smoothed n-grams and Support Vector Machines (SVM). Experiments on the NIST LRE2007 database demonstrated that using phone co-occurrence statistics could improve the performance of baseline phonotactic recognizers. In this paper, two variants of the cross-decoder phone co-occurrence SVM-based approach are proposed, by considering: (1) n-grams (up to 3-grams) of phone co-occurrences; and (2) co-occurrences of phone n-grams (up to 3-grams). To evaluate these approaches, a choice of open software (Brno University of Technology phone decoders, LIBLINEAR and FoCal) was used, and experiments were carried out on the NIST LRE2007 database. Unlike those presented in (Penagarikano, ICASSP2010), the two approaches presented in this paper outperformed the baseline phonotactic system, yielding around 16% relative improvement in terms of EER. The best fused system attained a 1,88% EER (a 30% improvement with regard to the baseline system), which supports the use of cross-decoder dependencies for language modeling.
Full Paper (PDF)
Bibliographic reference. Penagarikano, Mikel / Varona, Amparo / Rodriguez-Fuentes, Luis Javier / Bordel, German (2010): "Improved Modeling of Cross-Decoder Phone Co-occurrences in SVM-based Phonotactic Language Recognition", In Odyssey-2010, paper 040.