Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Construction and Utilization of Bilingual Speech Corpus for Simultaneous Machine Interpretation Research

Hitomi Tohyama (1), Shigeki Matsubara (1), Nobuo Kawaguchi (1), Yasuyoshi Inagaki (2)

(1) Nagoya University, Japan; (2) Aichi Prefectural University, Japan

This paper describes the design, analysis and utilization of a simultaneous interpretation corpus. The corpus has been constructed at the Center for Integrated Acoustic Information Research (CIAIR) of Nagoya University in order to promote the realization of the multi-lingual communication supporting environment. The size of transcribed data is about 1 million words, and the corpus would deserve to be called the simultaneous interpretation corpus of the largest-in-the-world class. The discourse tag and the utterance time tag were given to the corpus, and some software tools for corpus analysis in order to support the practical use of the corpus have been developed. Therefore, the corpus is expected to be useful not only for the development of simultaneous interpreting systems but also for the construction of an interpreting theory.

Full Paper

Bibliographic reference.  Tohyama, Hitomi / Matsubara, Shigeki / Kawaguchi, Nobuo / Inagaki, Yasuyoshi (2005): "Construction and utilization of bilingual speech corpus for simultaneous machine interpretation research", In INTERSPEECH-2005, 1585-1588.