8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Large Lexica for Speech-to-Speech Translation: From Specification to Creation

Elviira Hartikainen (1), Giulio Maltese (2), Asunción Moreno (3), Shaunie Shammass (4), Ute Ziegenhain (5)

(1) Nokia Research Center, Finland
(2) IBM Italy, Italy
(3) Universitat Politecnica de Catalunya, Spain
(4) Natural Speech Communication, Israel
(5) Siemens AG, Germany

This paper presents the corpora collection and lexica creation for the purposes of Automatic Speech Recognition (ASR) and Text-to-speech (TTS) that are needed in speech-to-speech translation (SST). These lexica will be specified, built and validated within the scope of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) during the years 2002-2005. Large lexica consisting of phonetic, prosodic and morpho-syntactic content will be provided with well-documented specifications for at least 12 languages [1]. This paper provides a short overview of the speech-to-speech translation lexica in general as well as a summary of the LC-STAR project itself. More detailed information about the specification for the corpora collection and word extraction as well as the specification and format of the lexica are presented in later chapters.

Full Paper

Bibliographic reference.  Hartikainen, Elviira / Maltese, Giulio / Moreno, Asunción / Shammass, Shaunie / Ziegenhain, Ute (2003): "Large lexica for speech-to-speech translation: from specification to creation", In EUROSPEECH-2003, 1529-1532.