11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Comparing Mono- & Multilingual Acoustic Seed Models for a Low e-Resourced Language: A Case-Study of Luxembourgish

Martine Adda-Decker, Lori Lamel, Natalie D. Snoeren

LIMSI, France

Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and has often been viewed as one of Europe's under-resourced languages. We focus on the acoustic modeling of Luxembourgish. By taking advantage of monolingual acoustic seeds selected from German, French or English model sets via IPA symbol correspondances, we investigated whether Luxembourgish spoken words were globally better represented by one of these languages. Although speech in Luxembourgish is frequently interspersed with French words, forced alignments on these data showed a clear preference for Germanic acoustic models with only a limited usage of French. German models provided the best match with 54% of the data, 35% for English and only 11% for French models. A further set of multilingual acoustic models, estimated from the pooled German, French, and English audio data allowed to capture between 27% and 48% of the data depending on conditions.

