Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates

Heriberto Cuayáhuitl, Seunghak Yu


Deep reinforcement learning dialogue systems are attractive because they can jointly learn their feature representations and policies without manual feature engineering. But its application is challenging due to slow learning. We propose a two-stage method for accelerating the induction of single or multi-domain dialogue policies. While the first stage reduces the amount of weight updates over time, the second stage uses very limited minibatches (of as much as two learning experiences) sampled from experience replay memories. The former frequently updates the weights of the neural nets at early stages of training, and decreases the amount of updates as training progresses by performing updates during exploration and by skipping updates during exploitation. The learning process is thus accelerated through less weight updates in both stages. An empirical evaluation in three domains (restaurants, hotels and tv guide) confirms that the proposed method trains policies 5 times faster than a baseline without the proposed method. Our findings are useful for training larger-scale neural-based spoken dialogue systems.


 DOI: 10.21437/Interspeech.2017-1060

Cite as: Cuayáhuitl, H., Yu, S. (2017) Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates. Proc. Interspeech 2017, 2511-2515, DOI: 10.21437/Interspeech.2017-1060.


@inproceedings{Cuayáhuitl2017,
  author={Heriberto Cuayáhuitl and Seunghak Yu},
  title={Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2511--2515},
  doi={10.21437/Interspeech.2017-1060},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1060}
}