8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Fast Semi-Automatic Semantic Annotation for Spoken Dialog Systems

Ruhi Sarikaya (1), Yuqing Gao (1), Paola Virga (2)

(1) IBM T.J. Watson Research Center, USA
(2) Johns Hopkins University, Korea

This paper describes a bootstrapping methodology for semiautomatic semantic annotation of a "mini-corpus" that is conventionally annotated manually to train an initial parser used in natural language understanding (NLU) systems. We propose to cast the problem of semantic annotation as a classification problem: each word is assigned a unique set of semantic tag(s) and/or label(s)from the universal tag/label set. This approach enables "local" semantic annotation resulting in partially annotated sentences. The proposed method reduces the annotation time and cost that forms a major bottleneck in the development of NLU systems. We present a set of experiments conducted on the medical domain "mini-corpus" that contains 10K hand-annotated sentences. Three annotation methods are compared: parser (baseline), similarity and classification--based annotations. The support vector machine (SVM) based classification scheme is shown to outperform both similarity and parsed--based annotation.

Full Paper

Bibliographic reference.  Sarikaya, Ruhi / Gao, Yuqing / Virga, Paola (2004): "Fast semi-automatic semantic annotation for spoken dialog systems", In INTERSPEECH-2004, 2281-2284.