Investigating Scalability in Hierarchical Language Identification System

Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li


State-of-the-art language identification (LID) systems are not easily scalable to accommodate new languages. Specifically, as the number of target languages grows the error rate of these LID systems increases rapidly. This paper addresses such a challenge by adopting a hierarchical language identification (HLID) framework. We demonstrate the superior scalability of the HLID framework. In particular, HLID only requires the training of relevant nodes in a hierarchical structure instead of re-training the entire tree. Experiments conducted on a dataset that combined languages from the NIST LRE 2007, 2009, 2011 and 2015 databases show that as the number of target languages grows from 28 to 42, the performance of a single level (non-hierarchical) system deteriorates by around 11% while that of the hierarchical system only deteriorates by about 3.4% in terms of Cavg. Finally, experiments also suggest that SVM based systems are more scalable than GPLDA based systems.


 DOI: 10.21437/Interspeech.2017-596

Cite as: Irtza, S., Sethu, V., Ambikairajah, E., Li, H. (2017) Investigating Scalability in Hierarchical Language Identification System. Proc. Interspeech 2017, 2581-2585, DOI: 10.21437/Interspeech.2017-596.


@inproceedings{Irtza2017,
  author={Saad Irtza and Vidhyasaharan Sethu and Eliathamby Ambikairajah and Haizhou Li},
  title={Investigating Scalability in Hierarchical Language Identification System},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2581--2585},
  doi={10.21437/Interspeech.2017-596},
  url={http://dx.doi.org/10.21437/Interspeech.2017-596}
}