Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Different Size Multilingual Phone Inventories and Context-Dependent Acoustic Models for Language Identification

Dong Zhu (1), Martine Adda-Decker (1), Fabien Antoine (2)

(1) LIMSI-CNRS, Orsay, France; (2) DGA-CTA, France

Experimental work using phonotactic and syllabotactic approaches for automatic language identification (LID) is presented. Various questions have originated this research: what is the best choice for a multilingual phone inventory? Can a syllabic unit be of interest to extend the scope of the modeling unit? Are context-dependent (CD) acoustic models, widely used for speech recognition, able to improve LID accuracy? Can the multilingual acoustic models process efficiently additional languages, which are different from the training languages? The LID system is experimentally studied using different sizes of multilingual phone sets: 73, 50 and 35 phones. Experiments are carried out on broadcast news in seven languages (German, English, Arabic, Mandarin, Spanish, French, and Italian) with 140-hours audio data for training and 7 hours for testing. It is shown that smaller phone inventories achieve higher LID accuracy and that CD models outperform CI models. Further experiments have been conducted to test generality of both the multilingual acoustic model and phonotactics methods on another 11+10 languages corpus (11 known + 10 unknown languages).

Full Paper

Bibliographic reference.  Zhu, Dong / Adda-Decker, Martine / Antoine, Fabien (2005): "Different size multilingual phone inventories and context-dependent acoustic models for language identification", In INTERSPEECH-2005, 2833-2836.