Combined Speaker Clustering and Role Recognition in Conversational Speech

Nikolaos Flemotomos, Pavlos Papadopoulos, James Gibson, Shrikanth Narayanan


Speaker Role Recognition (SRR) is usually addressed either as an independent classification task, or as a subsequent step after a speaker clustering module. However, the first approach does not take speaker-specific variabilities into account, while the second one results in error propagation. In this work we propose the integration of an audio-based speaker clustering algorithm with a language-aided role recognizer into a meta-classifier which takes both modalities into account. That way, we can treat separately any speaker-specific and role-specific characteristics before combining the relevant information together. The method is evaluated on two corpora of different conditions with interactions between a clinician and a patient and it is shown that it yields superior results for the SRR task.


 DOI: 10.21437/Interspeech.2018-1654

Cite as: Flemotomos, N., Papadopoulos, P., Gibson, J., Narayanan, S. (2018) Combined Speaker Clustering and Role Recognition in Conversational Speech. Proc. Interspeech 2018, 1378-1382, DOI: 10.21437/Interspeech.2018-1654.


@inproceedings{Flemotomos2018,
  author={Nikolaos Flemotomos and Pavlos Papadopoulos and James Gibson and Shrikanth Narayanan},
  title={Combined Speaker Clustering and Role Recognition in Conversational Speech},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1378--1382},
  doi={10.21437/Interspeech.2018-1654},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1654}
}