This work explores the use of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) for automatic language identification (LID). The use of RNNs is motivated by their better ability in modeling sequences with respect to feed forward networks used in previous works. We show that LSTM RNNs can effectively exploit temporal dependencies in acoustic data, learning relevant features for language discrimination purposes. The proposed approach is compared to baseline i-vector and feed forward Deep Neural Network (DNN) systems in the NIST Language Recognition Evaluation 2009 dataset. We show LSTM RNNs achieve better performance than our best DNN system with an order of magnitude fewer parameters. Further, the combination of the different systems leads to significant performance improvements (up to 28%).
Bibliographic reference. Gonzalez-Dominguez, Javier / Lopez-Moreno, Ignacio / Sak, Haşim / Gonzalez-Rodriguez, Joaquin / Moreno, Pedro J. (2014): "Automatic language identification using long short-term memory recurrent neural networks", In INTERSPEECH-2014, 2155-2159.