Grapheme-based models have been proposed for both ASR and TTS as a way of circumventing the lack of expert-compiled pronunciation lexicons in under-resourced languages. It is a common observation that this should work well in languages employing orthographies with a transparent letter-to-phoneme relationship, such as Spanish. Our experience has shown, however, that there is still a significant difference in intelligibility between grapheme-based systems and conventional ones for this language. This paper explores the contribution of different levels of linguistic annotation to system intelligibility, and the trade-off between those levels and the quantity of data used for training. Ten systems spaced across these two continua of knowledge and data were subjectively evaluated for intelligibility.
Bibliographic reference. Kay, Rosie / Watts, Oliver / Chicote, Roberto Barra / Mayo, Cassie (2015): "Knowledge versus data in TTS: evaluation of a continuum of synthesis systems", In INTERSPEECH-2015, 3335-3339.