16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Knowledge versus Data in TTS: Evaluation of a Continuum of Synthesis Systems

Rosie Kay (1), Oliver Watts (1), Roberto Barra Chicote (2), Cassie Mayo (1)

(1) University of Edinburgh, UK
(2) Universidad Politécnica de Madrid, Spain

Grapheme-based models have been proposed for both ASR and TTS as a way of circumventing the lack of expert-compiled pronunciation lexicons in under-resourced languages. It is a common observation that this should work well in languages employing orthographies with a transparent letter-to-phoneme relationship, such as Spanish. Our experience has shown, however, that there is still a significant difference in intelligibility between grapheme-based systems and conventional ones for this language. This paper explores the contribution of different levels of linguistic annotation to system intelligibility, and the trade-off between those levels and the quantity of data used for training. Ten systems spaced across these two continua of knowledge and data were subjectively evaluated for intelligibility.

Full Paper

Bibliographic reference.  Kay, Rosie / Watts, Oliver / Chicote, Roberto Barra / Mayo, Cassie (2015): "Knowledge versus data in TTS: evaluation of a continuum of synthesis systems", In INTERSPEECH-2015, 3335-3339.