8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


The NESPOLE! VoIP Multilingual Corpora in Tourism and Medical Domains

Nadia Mana (1), Susanne Burger (2), Roldano Cattoni (1), Laurent Besacier (3), Victoria MacLaren (2), John McDonough (4), Florian Metze (4)

(1) ITCirst, Italy
(2) Carnegie Mellon University, USA
(3) CLIPS-IMAG Laboratory, France
(4) Universitšt Karlsruhe, Germany

In this paper we present the multilingual VoIP (Voice over Internet Protocol networks) corpora collected for the second showcase of the Nespole! project in the tourism and medical domains. The corpora comprise over 20 hours of human-to-human monolingual dialogues in English, French, German and Italian: 66 dialogues in the tourism domain and 49 in the medical domain. We describe in detail the data collection (technical set-up, scenarios for each domain, recording procedure and data transcription), as well as statistically illustrated corpora and a preliminary data analysis.

Full Paper

Bibliographic reference.  Mana, Nadia / Burger, Susanne / Cattoni, Roldano / Besacier, Laurent / MacLaren, Victoria / McDonough, John / Metze, Florian (2003): "The NESPOLE! voIP multilingual corpora in tourism and medical domains", In EUROSPEECH-2003, 1589-1592.