Due to the limitation of single-level classification, existing fusion techniques experience difficulty in improving the performance of language identification when the number of languages and features are further increased. Given that the similarity of feature distribution between different languages may vary, we propose a novel hierarchical language identification framework with multi-level classification. In this approach, target languages are hierarchically clustered into groups according to the distance between them, models are trained both for individual languages and language groups, and classification is hierarchically done in multi-levels. This framework is implemented and evaluated in this paper, the results showing an relative 15.1% error-rate improvement in 30s case on OGI 10-language database compared to modern GMM fusion system.
Bibliographic reference. Yin, Bo / Ambikairajah, Eliathamby / Chen, Fang (2007): "Hierarchical language identification based on automatic language clustering", In INTERSPEECH-2007, 178-181.