The advances in corpus-based approaches and machine learning techniques have promoted the development of minority languages. The aim of this work is to acquire a parallel corpus in Spanish and Basque with both text and speech data. In order to be able to compare the obtained results with those developed for other languages, we took Europarl as a reference. Thus, the data was acquired within the Basque Parliament reports and speeches. The acquisition process shows subtle differences to that of Europarl acquisition. The resulting corpus is described and a few preliminary experiments on machine translation with Moses reported.
Index Terms: speech resources, statistical machine translation, under-resourced languages
Bibliographic reference. Pérez, Alicia / Alcaide, José M. / Torres, María-Inés (2012): "Euskoparl: a speech and text Spanish-basque parallel corpus", In INTERSPEECH-2012, 2362-2365.