Reinforcement learning (RL) is a promising technique for creating a dialog manager. RL accepts features of the current dialog state and seeks to find the best action given those features. Although it is often easy to posit a large set of potentially useful features, in practice, it is difficult to find the subset which is large enough to contain useful information yet compact enough to reliably learn a good policy. In this paper, we propose a method for RL optimization which automatically performs feature selection. The algorithm is based on least-squares policy iteration, a state-of-theart RL algorithm which is highly sample-efficient and can learn from a static corpus or on-line. Experiments in dialog simulation show it is more stable than a baseline RL algorithm taken from a working dialog system.
Bibliographic reference. Li, Lihong / Williams, Jason D. / Balakrishnan, Suhrid (2009): "Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection", In INTERSPEECH-2009, 2475-2478.