Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Tagging Spoken Corpus

Yue-Shi Lee, Hsin-Hsi Chen

Dept. of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan

Spoken languages are more flexible in usage than written languages. Thus, tagging spoken corpus differs from tagging traditional written corpus. This paper proposes a new framework for tagging spoken corpus. The framework adopts the written tagger to process spoken data with the special consideration of the characteristics of spoken language. Besides, the problems of different tagging sets between the written and spoken corpora are also considered in the framework. The presented approach makes an attempt at reducing the differences between these two kinds of language systems and the preliminary tests give very encouraging results.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Lee, Yue-Shi / Chen, Hsin-Hsi (1999): "Tagging spoken corpus", In EUROSPEECH'99, 2227-2230.