In this paper, we adopt the boosting framework to improve the performance of acoustic-based Gaussian mixture model (GMM) Language Identification (LID) systems. We introduce a set of low-complexity, boosted target and anti-models that are estimated from training data to improve class separation, and these models are integrated during the LID backend process. This results in a fast estimation process. Experiments were performed on the 12-language, NIST 2003 language recognition evaluation classification task using a GMM-acoustic-score-only LID system, as well as the one that combines GMM acoustic scores with sequence language model scores from GMM tokenization. Classification errors were reduced from 18.8% to 10.5% on the acoustic-score-only system, and from 11.3% to 7.8% on the combined acoustic and tokenization system.
Bibliographic reference. Yang, Xi / Siu, Man-hung / Gish, Herbert / Mak, Brian (2007): "Boosting with anti-models for automatic language identification", In INTERSPEECH-2007, 342-345.