This paper presents a simple approach to phonotactic language recognition which uses Lattices of Time-Synchronous Cross- Decoder Phone Co-occurrences at the frame level. In previous works we have successfully applied cross-decoder information, but using statistics of n-grams extracted from 1-best phone strings. In this work, the method to build and properly use lattices of cross-decoder phone co-occurrences is fully explained and developed. For evaluating the approach, a choice of open software (Brno University of Technology phone decoders, HTK, SRILM, LIBLINEAR and FoCal) was used, and experiments were carried out on the 2007 NIST LRE database. The proposed approach outperformed the baseline phonotactic systems both considering n-grams up to n=3 (yielding around 13% relative improvement) and up to n=4 (yielding around 7% relative improvement). In both cases, best results were obtained by considering the m=400 most likely cross-decoder co-occurrences: 1.29% EER and CLLR = 0.203. The fusion of the baseline system with the proposed approach yielded 1.22% EER and CLLR = 0.203 (meaning 18% and 15% relative improvements, respectively) for n=3, and 1.17% EER and CLLR = 0.197 (meaning 15% and 10% relative improvements, respectively) for n=4, outperforming state-of-the-art phonotactic systems on the same task.
Bibliographic reference. Varona, Amparo / Penagarikano, Mikel / Rodriguez-Fuentes, Luis Javier / Bordel, Germán (2011): "On the use of lattices of time-synchronous cross-decoder phone co-occurrences in a SVM-phonotactic language recognition system", In INTERSPEECH-2011, 2901-2904.