This paper describes a database consisting of speech and language, which we are currently constructing for the purpose of the research on machine interpretation. The database contains bilingual data of lectures and dialogues. We have collected the speech of about 72 hours in total and transcribed it into the text manually. We have investi- gated the database in order to acquire empirical knowledge of human interpreting. In this paper, we report the charac- teristic features of spoken language by Japanese-to-English interpreters.
Cite as: Aizawa, Y., Matsubara, S., Kawaguchi, N., Toyama, K., Inagaki, Y. (2000) Spoken language corpus for machine interpretation research. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 398-401, doi: 10.21437/ICSLP.2000-559
@inproceedings{aizawa00_icslp, author={Yasuyuki Aizawa and Shigeki Matsubara and Nobuo Kawaguchi and Katsuhiko Toyama and Yasuyoshi Inagaki}, title={{Spoken language corpus for machine interpretation research}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 3, 398-401}, doi={10.21437/ICSLP.2000-559} }