We present a semi-supervised framework to construct spoken language understanding resources with very low cost. We generate context patterns with a few seed entities and a large amount of unlabeled utterances. Using these context patterns, we extract new entities from the unlabeled utterances. The extracted entities are appended to the seed entities, and we can obtain the extended entity list by repeating these steps. Our method is based on an utterance alignment algorithm which is a variant of the biological sequence alignment algorithm. Using this method, we can obtain precise entity lists with high coverage, which is of help to reduce the cost of building resources for statistical spoken language understanding systems.
Bibliographic reference. Kim, Seokhwan / Jeong, Minwoo / Lee, Gary Geunbae (2007): "A semi-supervised method for efficient construction of statistical spoken language understanding resources", In INTERSPEECH-2007, 2797-2800.