10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Reinforcement Learning for Dialog Management Using Least-Squares Policy Iteration and Fast Feature Selection

Lihong Li (1), Jason D. Williams (2), Suhrid Balakrishnan (2)

(1) Rutgers University, USA
(2) AT&T Labs Research, USA

Reinforcement learning (RL) is a promising technique for creating a dialog manager. RL accepts features of the current dialog state and seeks to find the best action given those features. Although it is often easy to posit a large set of potentially useful features, in practice, it is difficult to find the subset which is large enough to contain useful information yet compact enough to reliably learn a good policy. In this paper, we propose a method for RL optimization which automatically performs feature selection. The algorithm is based on least-squares policy iteration, a state-of-theart RL algorithm which is highly sample-efficient and can learn from a static corpus or on-line. Experiments in dialog simulation show it is more stable than a baseline RL algorithm taken from a working dialog system.

Full Paper

Bibliographic reference.  Li, Lihong / Williams, Jason D. / Balakrishnan, Suhrid (2009): "Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection", In INTERSPEECH-2009, 2475-2478.