ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection

Lihong Li, Jason D. Williams, Suhrid Balakrishnan

Reinforcement learning (RL) is a promising technique for creating a dialog manager. RL accepts features of the current dialog state and seeks to find the best action given those features. Although it is often easy to posit a large set of potentially useful features, in practice, it is difficult to find the subset which is large enough to contain useful information yet compact enough to reliably learn a good policy. In this paper, we propose a method for RL optimization which automatically performs feature selection. The algorithm is based on least-squares policy iteration, a state-of-theart RL algorithm which is highly sample-efficient and can learn from a static corpus or on-line. Experiments in dialog simulation show it is more stable than a baseline RL algorithm taken from a working dialog system.


doi: 10.21437/Interspeech.2009-659

Cite as: Li, L., Williams, J.D., Balakrishnan, S. (2009) Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection. Proc. Interspeech 2009, 2475-2478, doi: 10.21437/Interspeech.2009-659

@inproceedings{li09c_interspeech,
  author={Lihong Li and Jason D. Williams and Suhrid Balakrishnan},
  title={{Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2475--2478},
  doi={10.21437/Interspeech.2009-659}
}