Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

CASS: A Phonetically Transcribed Corpus of Mandarin Spontaneous Speech

Aijun Li (1), Fang Zheng (2), William Byrne (3), Pascale Fung (4), Terri Kamm (3), Yi Liu (4), Zhanjiang Song (2), Umar Ruhi (5), Veera Venkataramani (3), XiaoXia Chen (1)

(1) Chinese Academy of Social Sciences, Beijing, China
(2) Tsinghua University, Beijing, China
(3) John Hopkins University, Baltimore, MD, USA
(4) University of Science and Technology, Hong Kong
(5) University of Toronto, Canada

A collection of Chinese spoken language has been collected and phonetically annotated to capture spontaneous speech and language effects. The Chinese Annotated Spontaneous Speech (CASS) corpus contains phonetically transcribed spontaneous speech. This corpus was created to begin to collect samples of most of the phonetic variations in Mandarin spontaneous speech due to pronunciation effects, including allophonic changes, phoneme reduction, phoneme deletion and insertion, as well as duration changes. It is intended for use in pronunciation modeling for improved automatic speech recognition and will be used at the 2000 Johns Hopkins University Language Engineering Workshop by the project on Pronunciation Modeling ofMandarin Casual Speech.

Full Paper

Bibliographic reference.  Li, Aijun / Zheng, Fang / Byrne, William / Fung, Pascale / Kamm, Terri / Liu, Yi / Song, Zhanjiang / Ruhi, Umar / Venkataramani, Veera / Chen, XiaoXia (2000): "CASS: a phonetically transcribed corpus of mandarin spontaneous speech", In ICSLP-2000, vol.1, 485-488.