9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Minimal Training Based Semantic Categorization in a Voice Activated Question Answering (VAQA) System

Mithun Balakrishna, Marta Tatu, Dan Moldovan

Lymba Corporation, USA

In this paper, we develop a knowledge based methodology that maps Automatic Speech Recognizer (ASR) transcriptions to predefined semantic categories in a Voice Activated Question Answering (VAQA) system. The proposed semantic categorization methodology, SemCat, uses a novel lexical chains/ontology based algorithm and relies heavily on customized but domain independent Natural Language Processing (NLP) tools and does not require any domainspecific utterance collections or manually annotated text data. SemCat requires minimal manual intervention during training, relying only on the semantics encoded in a brief, manually-created description for each predefined category/ slot. SemCat uses these descriptions along with the eXtended WordNet Knowledge Base (XWN-KB) and several domain independent NLP tools including XWN lexical chains to accurately extract information andmap user utterances to predefined categories. SemCat also uses the domain ontologies created automatically by the Jaguar knowledge acquisition tool to accurately extract domain/customer specific language/terms.

Full Paper

Bibliographic reference.  Balakrishna, Mithun / Tatu, Marta / Moldovan, Dan (2008): "Minimal training based semantic categorization in a voice activated question answering (VAQA) system", In INTERSPEECH-2008, 479-482.