Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
This paper presents techniques for building text-to-speech front-ends in a way that avoids the need for language-specific expert knowledge, but instead relies on universal resources (such as the Unicode character database) and unsupervised learning from unannotated data to ease system development. The acquisition of expert language-specific knowledge and expert annotated data is a major bottleneck in the development of corpus-based TTS systems in new languages. The methods presented here side-step the need for such resources as pronunciation lexicons, phonetic feature sets, part of speech tagged data, etc. The paper explains how the techniques introduced are applied to the 14 languages of a corpus of found audiobook data. Results of an evaluation of the intelligibility of the systems resulting from applying these novel techniques to this data are presented. Index Terms: multilingual speech synthesis, unsupervised learning, vector space model, text-to-speech, audiobook data
Bibliographic reference. Watts, Oliver / Stan, Adriana / Clark, Robert A. J. / Mamiya, Yoshitaka / Giurgiu, Mircea / Yamagishi, Junichi / King, Simon (2013): "Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from found data: evaluation and analysis", In SSW8, 101-106.