Assessing Speaker Engagement in 2-Person Debates: Overlap Detection in United States Presidential Debates

Midia Yousefi, Navid Shokouhi, John H.L. Hansen


Co-channel speech contain significant amounts of overlap in which the intelligibility and quality of the desired speech can be degraded. Convolutive Non-negative Matrix Factorization (CNMF) has been shown to be a successful approach in detecting overlap by extracting specific acoustic basis dimensions for each speaker from an audio stream. While the results of CNMF have been successful, it requires isolated single speech recordings for each speaker to derive their corresponding bases functions/dimensions. In our previous work, Teager-Kaiser Energy Operator (TEO)-based Pyknogram has been introduced. In this study, Pyknogram and CNMF based solutions for overlap detection within audio streams have been examined using the GRID dataset. TEO-based Pyknogram is shown to achieve a relative 8-10% lower Equal Error Rate (EER) compared to CNMF features. In addition, a secondary evaluation was also performed based on naturalistic audio streams with overlap. Specifically, we collected a real-world audio database of US Presidential debates stemming from the last 12 years that are very challenging due to various forms of overlaps, changing Signal to Interference Ratio (SIR) and, environmental noise among others. Our experiments indicate that TEO-based Pyknogram is well suited for detecting overlap in challenging real world scenarios such as the US presidential debates.


 DOI: 10.21437/Interspeech.2018-1463

Cite as: Yousefi, M., Shokouhi, N., Hansen, J.H. (2018) Assessing Speaker Engagement in 2-Person Debates: Overlap Detection in United States Presidential Debates. Proc. Interspeech 2018, 2117-2121, DOI: 10.21437/Interspeech.2018-1463.


@inproceedings{Yousefi2018,
  author={Midia Yousefi and Navid Shokouhi and John H.L. Hansen},
  title={Assessing Speaker Engagement in 2-Person Debates: Overlap Detection in United States Presidential Debates},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2117--2121},
  doi={10.21437/Interspeech.2018-1463},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1463}
}