Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

The Acquisition of a Speech Corpus for Limited Domain Translation

Demetrio Aiello, Loredana Cerrato, Cristina Delogu, Andrea Di Carlo

Fondazione Ugo Bordoni, Rome, Italy

In this paper we report on the first phase of the speech corpus collection for purposes of the ESPRIT LTR project n. 30268, EuTrans. The corpus is intended to provide training material for speaker independent continuous speech recognition and translation over the telephone line, based on a vocabulary of few thousands words. Due to its application the corpus is structured so to contain speech material for acoustic modelling, and textual material for language modelling and translation modelling. The speech material which is being collected, and which we will describe in this paper, has been produced in a natural way. The corpus will be described with the aid of some statistic results obtained to better illustrate the characteristics of the acquired material. We will finally present our future plan for the collection of other parts of the corpus and in particular we will introduce a new "dialogue oriented" collection paradigm.

Full Paper (PDF)

Bibliographic reference.  Aiello, Demetrio / Cerrato, Loredana / Delogu, Cristina / Carlo, Andrea Di (1999): "The acquisition of a speech corpus for limited domain translation", In EUROSPEECH'99, 2223-2226.