Initial attempts at performing text-to-speech conversion based on standard orthographic units are presented, forming part of a larger scheme of training TTS systems on features that can be trivially extracted from text. We evaluate the possibility of using the technique of decision-tree-based context clustering conventionally used in HMM-based systems for parametertying to handle letter-to-sound conversion. We present the application of a method of compound-feature discovery to corpusbased speech synthesis. Finally, an evaluation of intelligibility of letter-based systems and more conventional phoneme-based systems is presented.
Index Terms: Statistical parametric speech synthesis, HMMbased speech synthesis, letter-to-sound conversion, graphemes
Cite as: Watts, O., Yamagishi, J., King, S. (2010) Letter-based speech synthesis. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 317-322
@inproceedings{watts10_ssw, author={Oliver Watts and Junichi Yamagishi and Simon King}, title={{Letter-based speech synthesis}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={317--322} }