Sixth European Conference on Speech Communication and Technology
Spoken languages are more flexible in usage than written languages. Thus, tagging spoken corpus differs from tagging traditional written corpus. This paper proposes a new framework for tagging spoken corpus. The framework adopts the written tagger to process spoken data with the special consideration of the characteristics of spoken language. Besides, the problems of different tagging sets between the written and spoken corpora are also considered in the framework. The presented approach makes an attempt at reducing the differences between these two kinds of language systems and the preliminary tests give very encouraging results.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Lee, Yue-Shi / Chen, Hsin-Hsi (1999): "Tagging spoken corpus", In EUROSPEECH'99, 2227-2230.