Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning

Stefan Ultes, Paweł Budzianowski, Iñigo Casanueva, Nikola Mrkšić, Lina Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica Gašić, Steve Young


Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we propose to use a reward based on user satisfaction. We will show in simulated experiments that a live user satisfaction estimation model may be applied resulting in higher estimated satisfaction whilst achieving similar success rates. Moreover, we will show that one satisfaction estimation model which has been trained on one domain may be applied in many other domains which cover a similar task. We will verify our findings by employing the model to one of the domains for learning a policy from real users and compare its performance to policies using the user satisfaction and task success acquired directly from the users as reward.


 DOI: 10.21437/Interspeech.2017-1032

Cite as: Ultes, S., Budzianowski, P., Casanueva, I., Mrkšić, N., Rojas-Barahona, L., Su, P., Wen, T., Gašić, M., Young, S. (2017) Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning. Proc. Interspeech 2017, 1721-1725, DOI: 10.21437/Interspeech.2017-1032.


@inproceedings{Ultes2017,
  author={Stefan Ultes and Paweł Budzianowski and Iñigo Casanueva and Nikola Mrkšić and Lina Rojas-Barahona and Pei-Hao Su and Tsung-Hsien Wen and Milica Gašić and Steve Young},
  title={Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1721--1725},
  doi={10.21437/Interspeech.2017-1032},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1032}
}