13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Preference-learning based Inverse Reinforcement Learning for Dialog Control

Hiroaki Sugiyama, Toyomi Meguro, Yasuhiro Minami

NTT Communication Science Laboratories, Kyoto, Japan

Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwise-preferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.

Full Paper

Bibliographic reference.  Sugiyama, Hiroaki / Meguro, Toyomi / Minami, Yasuhiro (2012): "Preference-learning based inverse reinforcement learning for dialog control", In INTERSPEECH-2012, 222-225.