Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Minimum Classification Error Training of Hidden Markov Models for Acoustic Language Identification

Josef G. Bauer, Ekaterina Timoshenko

Siemens AG, Germany

The goal of acoustic Language Identification (LID) is to identify the language of spoken utterances. The described system is based on parallel Hidden Markov Model (HMM) phoneme recognizers. The standard approach for parameter learning of Hidden Markov Model parameters is Maximum Likelihood (ML) estimation which is not directly related to the classification error rate. Based on the Minimum Classification Error (MCE) parameter estimation scheme we introduce Minimum Language Identification Error (MLIDE) training that results in HMM model parameters (mean vectors) that give minimum classification error on the training data. Using a large telephone speech corpus with 7 languages achieve a language classification error rate of 4.7% which is a 40% reduction of error rate compared with a baseline system using ML trained HMMs. Even if the system trained on fixed network telephone speech is applied to mobile network speech data MLIDE can greatly improve the system performance.

Full Paper

Bibliographic reference.  Bauer, Josef G. / Timoshenko, Ekaterina (2006): "Minimum classification error training of hidden Markov models for acoustic language identification", In INTERSPEECH-2006, paper 1981-Mon2CaP.2.