Behavioral and mental health research and its clinical applications widely rely on quantifying human behavioral expressions. This often requires human-derived behavioral annotations, which tend to be noisy, especially when the psychological objects of interest are latent and subjective in nature. This paper focuses on exploiting multiple human annotations toward improving reliability of the ensemble decision, by creating a ranking of the evaluated objects. To create this ranking, we employ an adapted version of Copeland’s counting method, which results in robust inter-annotator rankings and agreement. We use a simple mapping between the ranked objects and the scale of evaluation, which preserves the original distribution of ratings, based on maximum likelihood estimation. We apply the algorithm to ratings that lack a ground truth. Therefore, we assess our algorithm in two ways: (1) by corrupting the annotations with different distributions of noise, and computing the inter-annotator agreement between the ensemble estimates derived from the original and corrupted data using Krippendorff’s α; and (2) by replacing one annotator at a time with the ensemble estimate. Our results suggest that the proposed method provides a robust alternative that suffers less from individual annotator preferences/biases and scale misuse.
Cite as: Mundnich, K., Nasir, M., Georgiou, P., Narayanan, S.S. (2017) Exploiting Intra-Annotator Rating Consistency Through Copeland’s Method for Estimation of Ground Truth Labels in Couples’ Therapy. Proc. Interspeech 2017, 3167-3171, doi: 10.21437/Interspeech.2017-1599
@inproceedings{mundnich17_interspeech, author={Karel Mundnich and Md. Nasir and Panayiotis Georgiou and Shrikanth S. Narayanan}, title={{Exploiting Intra-Annotator Rating Consistency Through Copeland’s Method for Estimation of Ground Truth Labels in Couples’ Therapy}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={3167--3171}, doi={10.21437/Interspeech.2017-1599} }