Comparison of an End-to-end Trainable Dialogue System with a Modular Statistical Dialogue System

Norbert Braunschweiler, Alexandros Papangelis


This paper presents a comparison of two dialogue systems: one is end-to-end trainable and the other uses a more traditional, modular architecture. End-to-end trainable dialogue systems recently attracted a lot of attention because they offer several advantages over traditional systems. One of them is the avoidance to train each system module independently, by creating a single network architecture which maps an input to the corresponding output without the need for intermediate representations. While the end-to-end system investigated here had been tested in a text-in/out scenario it remained an open question how the system would perform in a speech-in/out scenario, with noisy input from a speech recognizer and output speech generated by a speech synthesizer. To evaluate this, both dialogue systems were trained on the same corpus, including human-human dialogues in the Cambridge restaurant domain, and then compared in both scenarios by human evaluation. The results show, that in both interfaces the end-to-end system receives significantly higher ratings on all metrics than the traditional modular system, an indication that it enables users to reach their goals faster and experience both a more natural system response and a better comprehension by the dialogue system.


 DOI: 10.21437/Interspeech.2018-1679

Cite as: Braunschweiler, N., Papangelis, A. (2018) Comparison of an End-to-end Trainable Dialogue System with a Modular Statistical Dialogue System. Proc. Interspeech 2018, 576-580, DOI: 10.21437/Interspeech.2018-1679.


@inproceedings{Braunschweiler2018,
  author={Norbert Braunschweiler and Alexandros Papangelis},
  title={Comparison of an End-to-end Trainable Dialogue System with a Modular Statistical Dialogue System},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={576--580},
  doi={10.21437/Interspeech.2018-1679},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1679}
}