End-to-End Multi-Level Dialog Act Recognition

Eugénio Ribeiro, Ricardo Ribeiro, David Martins de Matos


The three-level dialog act annotation scheme of the DIHANA corpus poses a multi-level classification problem in which the bottom levels allow multiple or no labels for a single segment. We approach automatic dialog act recognition on the three levels using an end-to-end approach, in order to implicitly capture relations between them. Our deep neural network classifier uses a combination of word- and character-based segment representation approaches, together with a summary of the dialog history and information concerning speaker changes. We show that it is important to specialize the generic segment representation in order to capture the most relevant information for each level. On the other hand, the summary of the dialog history should combine information from the three levels to capture dependencies between them. Furthermore, the labels generated for each level help in the prediction of those of the lower levels. Overall, we achieve results which surpass those of our previous approach using the hierarchical combination of three independent per-level classifiers. Furthermore, the results even surpass the results achieved on the simplified version of the problem approached by previous studies, which neglected the multi-label nature of the bottom levels and only considered the label combinations present in the corpus.


 DOI: 10.21437/IberSPEECH.2018-63

Cite as: Ribeiro, E., Ribeiro, R., Martins de Matos, D. (2018) End-to-End Multi-Level Dialog Act Recognition. Proc. IberSPEECH 2018, 301-305, DOI: 10.21437/IberSPEECH.2018-63.


@inproceedings{Ribeiro2018,
  author={Eugénio Ribeiro and Ricardo Ribeiro and David {Martins de Matos}},
  title={{End-to-End Multi-Level Dialog Act Recognition}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={301--305},
  doi={10.21437/IberSPEECH.2018-63},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-63}
}