We show how to learn optimal dialogue policies for a wide range of database search applications, concerning how many database search results to present to the user, and when to present them. We use Reinforcement Learning methods for a wide spectrum of different database simulations, turn penalty conditions, and noise conditions. Our objective is to show that our policy learning framework covers this spectrum. We can show that even for challenging cases learning significantly outperforms hand-coded policies tailored to the different operating situations. The polices are adaptive/context-sensitive in respect of both the overall operating situation (e.g. noise) and the local context of the interaction (e.g. user's last move). The learned policies produce an average relative increase in reward of 25.7% over the corresponding threshold-based hand-coded baseline policies.
Bibliographic reference. Rieser, Verena / Lemon, Oliver (2007): "Learning dialogue strategies for interactive database search", In INTERSPEECH-2007, 2689-2692.