12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Bootstrapping Domain Detection Using Query Click Logs for New Domains

Dilek Hakkani-Tür, Gokhan Tur, Larry Heck, Elizabeth Shriberg

Microsoft Speech Labs, USA

Domain detection in spoken dialog systems is usually treated as a multi-class, multi-label classification problem, and training of domain classifiers requires collection and manual annotation of example utterances. In order to extend a dialog system to new domains in a way that is seamless for users, domain detection should be able to handle utterances from the new domain as soon as it is introduced. In this work, we propose using web search query logs, which include queries entered by users and the links they subsequently click on, to bootstrap domain detection for new domains. While sampling user queries from the query click logs to train new domain classifiers, we introduce two types of measures based on the behavior of the users who entered a query and the form of the query. We show that both types of measures result in reductions in the error rate as compared to randomly sampling training queries. In controlled experiments over five domains, we achieve the best gain from the combination of the two types of sampling criteria.

Full Paper

Bibliographic reference.  Hakkani-Tür, Dilek / Tur, Gokhan / Heck, Larry / Shriberg, Elizabeth (2011): "Bootstrapping domain detection using query click logs for new domains", In INTERSPEECH-2011, 709-712.