ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Comparison of two phonetic approaches to language identification

François Pellegrino, Jérôme Farinas, Régine André-Obrecht

This paper presents two unsupervised approaches to Automatic Language Identification (ALI) based on a segmental preprocessing. In the Global Segmental Model approach, the language system is modeled by a Gaussian Mixture Model (GMM) trained with automatically detected segments. In the Phonetic Differentiated Model approach, an unsupervised detection vowel/non vowel is performed and the language model is defined with two GMMs, one to model the vowel segments and a second one to model the others segments. For each approach, no labeled data are required. GMMs are initialized using an efficient data-driven variant of the LBG algorithm: the LBG-Rissanen algorithm. With 5 languages from the OGI MLTS corpus and in a closed set identification task, we reach 85 % of correct identification with each system using 45 second duration utterances for the male speakers. We increase this performance (91%) when we merge the two systems.


doi: 10.21437/Eurospeech.1999-103

Cite as: Pellegrino, F., Farinas, J., André-Obrecht, R. (1999) Comparison of two phonetic approaches to language identification. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 399-402, doi: 10.21437/Eurospeech.1999-103

@inproceedings{pellegrino99_eurospeech,
  author={François Pellegrino and Jérôme Farinas and Régine André-Obrecht},
  title={{Comparison of two phonetic approaches to language identification}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={399--402},
  doi={10.21437/Eurospeech.1999-103}
}