INTERSPEECH 2004 - ICSLP
This paper describes a bootstrapping methodology for semiautomatic semantic annotation of a "mini-corpus" that is conventionally annotated manually to train an initial parser used in natural language understanding (NLU) systems. We propose to cast the problem of semantic annotation as a classification problem: each word is assigned a unique set of semantic tag(s) and/or label(s)from the universal tag/label set. This approach enables "local" semantic annotation resulting in partially annotated sentences. The proposed method reduces the annotation time and cost that forms a major bottleneck in the development of NLU systems. We present a set of experiments conducted on the medical domain "mini-corpus" that contains 10K hand-annotated sentences. Three annotation methods are compared: parser (baseline), similarity and classification--based annotations. The support vector machine (SVM) based classification scheme is shown to outperform both similarity and parsed--based annotation.
Bibliographic reference. Sarikaya, Ruhi / Gao, Yuqing / Virga, Paola (2004): "Fast semi-automatic semantic annotation for spoken dialog systems", In INTERSPEECH-2004, 2281-2284.