8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Hierarchical Language Identification Based on Automatic Language Clustering

Bo Yin (1), Eliathamby Ambikairajah (1), Fang Chen (2)

(1) University of New South Wales, Australia
(2) NICTA, Australia

Due to the limitation of single-level classification, existing fusion techniques experience difficulty in improving the performance of language identification when the number of languages and features are further increased. Given that the similarity of feature distribution between different languages may vary, we propose a novel hierarchical language identification framework with multi-level classification. In this approach, target languages are hierarchically clustered into groups according to the distance between them, models are trained both for individual languages and language groups, and classification is hierarchically done in multi-levels. This framework is implemented and evaluated in this paper, the results showing an relative 15.1% error-rate improvement in 30s case on OGI 10-language database compared to modern GMM fusion system.

Full Paper

Bibliographic reference.  Yin, Bo / Ambikairajah, Eliathamby / Chen, Fang (2007): "Hierarchical language identification based on automatic language clustering", In INTERSPEECH-2007, 178-181.