On the use of phone-gram units in recurrent neural networks for language identification

Christian Salamea, Luis Fernando D'Haro, Ricardo Cordoba, Rubén San-Segundo


In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phone-grams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system.


DOI: 10.21437/Odyssey.2016-17

Cite as

Salamea, C., D'Haro, L.F., Cordoba, R., San-Segundo, R. (2016) On the use of phone-gram units in recurrent neural networks for language identification. Proc. Odyssey 2016, 117-123.

Bibtex
@inproceedings{Salamea+2016,
author={Christian Salamea and Luis Fernando D'Haro and Ricardo Cordoba and Rubén San-Segundo},
title={On the use of phone-gram units in recurrent neural networks  for language identification},
year=2016,
booktitle={Odyssey 2016},
doi={10.21437/Odyssey.2016-17},
url={http://dx.doi.org/10.21437/Odyssey.2016-17},
pages={117--123}
}