The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

Evaluating Speech Synthesis Intelligibility using Amazon Mechanical Turk

Maria K. Wolters, Karl B. Isaac, Steve Renals

Centre for Speech Technology Research, University of Edinburgh, UK

Microtask platforms such as Amazon Mechanical Turk (AMT) are increasingly used to create speech and language resources. AMT in particular allows researchers to quickly recruit a large number of fairly demographically diverse participants. In this study, we investigated whether AMT can be used for comparing the intelligibility of speech synthesis systems. We conducted two experiments in the lab and via AMT, one comparing US English diphone to US English speaker-adaptive HTS synthesis and one comparing UK English unit selection to UK English speaker-dependent HTS synthesis. While AMT word error rates were worse than lab error rates, AMT results were more sensitive to relative differences between systems. This is mainly due to the larger number of listeners. Boxplots and multilevel modelling allowed us to identify listeners who performed particularly badly, while thresholding was sufficient to eliminate rogue workers. We conclude that AMT is a viable platform for synthetic speech intelligibility comparisons.

Index Terms: intelligibility, evaluation, semantically unpredictable sentences, diphone, unit selection, crowdsourcing, Mechanical Turk, HMM-based synthesis

Full Paper

Bibliographic reference.  Wolters, Maria K. / Isaac, Karl B. / Renals, Steve (2010): "Evaluating speech synthesis intelligibility using Amazon Mechanical Turk", In SSW7-2010, 136-141.