Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Data Collection and Processing in a Chinese Spontaneous Speech Corpus IIS_CSS

JunLan Feng, XianFang Wang, LiMin Du

Institute of Acoustics, Chinese Academy Of Sciences, Beijing, China

In this paper we report on the first phase of the speech corpus ISS_CSS collection for purposes of the CEST(Chinese-English speech translation) project. The corpus is intended to provide training material for speaker independent spontaneous Chinese speech recognition and automatic dialogue management over the telephone line. This paper describes the collection measures, processing methods, annotation and contents of this corpus. It consists of two parts: human-human dialogues and human-machine dialogues. Presently, the corpus has finished 10-hour speech and the associated annotation. Finally, we will present our collecting plan in the future.

Full Paper

Bibliographic reference.  Feng, JunLan / Wang, XianFang / Du, LiMin (2000): "Data collection and processing in a Chinese spontaneous speech corpus IIS_CSS", In ICSLP-2000, vol.3, 394-397.