Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Unsupervised Adaptation for Acoustic Language Identification

Ekaterina Timoshenko, Josef G. Bauer

Siemens AG, Germany

Our system for automatic language identification (LID) of spoken utterances is performed with language dependent parallel phoneme recognition (PPR) using Hidden Markov Model (HMM) phoneme recognizers and optional phoneme language models (LMs). Such a LID system for continuous speech requires many hours of orthographically transcribed data for training of language dependent HMMs and LMs as well as phonetic lexica for every considered language (supervised training). To avoid the time consuming process of obtaining the orthographically transcribed training material we propose an algorithm for automatic unsupervised adaptation that requires only raw audio data as training material covering the requested language and acoustic environment. The LID system was trained and evaluated using fixed and mobile network databases (DBs) from the SpeechDat II corpus. The baseline system - based on supervised training using fixed network databases and covering 4 languages - achieved a LID error rate of 6.7% for fixed data and 19.5% for mobile data. Using unsupervised adaptation of the HMMs trained on fixed network data the error rate for mobile DBs database mismatch is reduced to 10.6%. Exploring a situation when orthographically transcribed training data is not available at all multilingual HMMs were unsupervised adapted to fixed and mobile DBs and perform at 10.8% and 12.4% error rate respectively.

Full Paper

Bibliographic reference.  Timoshenko, Ekaterina / Bauer, Josef G. (2006): "Unsupervised adaptation for acoustic language identification", In INTERSPEECH-2006, paper 1494-Mon2CaP.3.