In common approaches to phonotactic language recognition, decodings are processed and scored in a fully uncoupled way, their time alignment being completely lost. Recently, we have presented a new approach to phonotactic language recognition which takes into account time alignment information, by considering cross-decoder co-occurrences of phones or phone n-grams at the frame level. In this work, the approach based on cross-decoder co-occurrences of phone n-grams is further developed and evaluated. Systems were built by means of open software and experiments were carried out on the NIST LRE2007 database. A system based on co-occurrences of phone n-grams (up to 4-grams) outperformed the baseline phonotactic system, yielding around 8% relative improvement in terms of EER. The best fused system attained 1,90% EER, which supports the use of cross-decoder dependencies for improved language modeling.
Bibliographic reference. Penagarikano, Mikel / Varona, Amparo / Rodriguez-Fuentes, Luis Javier / Bordel, German (2010): "Using cross-decoder co-occurrences of phone n-grams in SVM-based phonotactic language recognition", In INTERSPEECH-2010, 745-748.