We describe an approach to crowdsource the evaluation of TTS systems by preference tests and report on lessons learnt from running 127 real-life crowdsourced tests. We show that at least one type of cheating becomes more prevalent over time if left unchecked and develop metrics to exclude cheaters. We demonstrate that their exclusion improves test outcomes.
Bibliographic reference. Buchholz, Sabine / Latorre, Javier (2011): "Crowdsourcing preference tests, and how to detect cheating", In INTERSPEECH-2011, 3053-3056.