11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Using Cross-Decoder Co-Occurrences of Phone N-Grams in SVM-Based Phonotactic Language Recognition

Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-Fuentes, German Bordel

UPV/EHU (University of the Basque Country), Spain

In common approaches to phonotactic language recognition, decodings are processed and scored in a fully uncoupled way, their time alignment being completely lost. Recently, we have presented a new approach to phonotactic language recognition which takes into account time alignment information, by considering cross-decoder co-occurrences of phones or phone n-grams at the frame level. In this work, the approach based on cross-decoder co-occurrences of phone n-grams is further developed and evaluated. Systems were built by means of open software and experiments were carried out on the NIST LRE2007 database. A system based on co-occurrences of phone n-grams (up to 4-grams) outperformed the baseline phonotactic system, yielding around 8% relative improvement in terms of EER. The best fused system attained 1,90% EER, which supports the use of cross-decoder dependencies for improved language modeling.

