Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

The Translanguage English Database (TED)

Lori F. Lamel (1), Florian Schiel (2), Adrian Fourcin (3), Joseph Mariani (1), Hans G. Tillmann (2)

(1) LIMSI-CNRS, Orsay Cedex, France
(2) Institute of Phonetics, Univ. Munich, München, Germany
(3) University College London, London, UK

The Translanguage English Database is a corpus of recordings made of oral presentations at Eurospeech93 in Berlin. The corpus name derives from the high percentage of presentations given in English by non-native speakers of English. 224 oral presentations at the conference were successfully recorded, providing a total of about 75 hours of speech material. These recordings provide a relatively large number of speakers speaking a variant of the same language (English) over a relatively large amount of time (15 min each + 5 min discussion) on a specific topic. A subset of speakers were recorded with a laryngograph in addition to the standard microphone. A set of Polyphone-like recordings were made, for which a subset also had a laryngograph signal recorded. These recordings were made in English and in the speaker's mother language.

In addition to the spoken material, associated text materials are being collected. These include written versions of the proceedings papers and any oral preparations texts which were made available. The text materials will provide vocabulary items and data for language modeling. Speakers were also asked to complete a short questionnaire regarding their mother language, any other languages they speak, as well as their knowledge of English.

Full Paper

Bibliographic reference.  Lamel, Lori F. / Schiel, Florian / Fourcin, Adrian / Mariani, Joseph / Tillmann, Hans G. (1994): "The translanguage English database (TED)", In ICSLP-1994, 1795-1798.