Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

A Complementary Approach to Computer-Aided Transcription: Synergy of Statistical-Based and Kbnowledge Discovery Paradigms

Benjamin K. T'sou, Tom B. Y. Lai

Language Information Sciences Research Center, City University of Hong Kong

The recent implementation of legal bilingualism necessitates the development of a Chinese Computer-Aided Transcription (CAl) system to produce Chinese court proceedings conducted in Cantonese. The transcription system converts transcription shorthand codes into Chinese text, i.e., from phonetic to textual representation of the language. Cantonese and Mandarin Chinese have many homophonous characters. The main challenge lies in the resolution of the severe ambiguity of the conversion. N-gram statistical model is incorporated to estimate the most probable character string during conversion. Domain-specific corpora have been compiled to support the statistical computation. With additional enhancement features, the CAT system delivers a transcription accuracy of 96%. An intelligent error detection tool is built into the system to facilitate the manual correction of the remaining errors. Using decision tree algorithm and a range of text and linguistic attributes, the system can effectively alert the users to possible errors.

