In the framework of the EU-funded Project LC-STAR, a set of Language Resources (LR) for all the Speech to Speech Translation components (Speech recognition, Machine Translation and Speech Synthesis) was developed. This paper deals with the development of bilingual corpora in Spanish, US English and Catalan. The corpora were obtained from spontaneous dialogues in one of these three languages which were translated to the other two languages. The paper describes the translation methodology, specific problems of translating spontaneous dialogues to be used for MT training, formats and the validation criteria.
Cite as: Conejero, D., Lounds, A., Mateo, C.G., Rodriguez-Linares, L., Mochales, R., Moreno, A. (2005) Bilingual aligned corpora for speech to speech translation for Spanish, English and Catalan. Proc. Interspeech 2005, 1573-1576, doi: 10.21437/Interspeech.2005-460
@inproceedings{conejero05_interspeech, author={David Conejero and Alan Lounds and Carmen Garcia Mateo and Leandro Rodriguez-Linares and Raquel Mochales and Asuncion Moreno}, title={{Bilingual aligned corpora for speech to speech translation for Spanish, English and Catalan}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1573--1576}, doi={10.21437/Interspeech.2005-460} }