Language identification (LID) systems typically employ i-vectors as fixed length representations of utterances. However, it may not be possible to reliably estimate i-vectors from short utterances, which in turn could lead to reduced language identification accuracy. Recently, Long Short Term Memory networks (LSTMs) have been shown to better model short utterances in the context of language identification. This paper explores the use of bidirectional LSTMs for language identification with the aim of modelling temporal dependencies between past and future frame based features in short utterances. Specifically, an end-to-end system for short duration language identification employing bidirectional LSTM models of utterances is proposed. Evaluations on both NIST 2007 and 2015 LRE show state-of-the-art performance.
Cite as: Fernando, S., Sethu, V., Ambikairajah, E., Epps, J. (2017) Bidirectional Modelling for Short Duration Language Identification. Proc. Interspeech 2017, 2809-2813, doi: 10.21437/Interspeech.2017-286
@inproceedings{fernando17_interspeech, author={Sarith Fernando and Vidhyasaharan Sethu and Eliathamby Ambikairajah and Julien Epps}, title={{Bidirectional Modelling for Short Duration Language Identification}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2809--2813}, doi={10.21437/Interspeech.2017-286} }