Designing text scripts that cover enough phonetic units and prosodic phenomena is very important when recording speech database for corpus based speech synthesis. When designing recording scripts for speech synthesis databases, a lot of effort is often placed on how to achieve maximal coverage of phonetic units in minimal speech recording. With such methods, sentences with difficult words or incorrect grammar are often selected. It is difficult for speakers to read these sentences correctly and naturally. Also, the selected sentences may not be suitable for child speakers or non-native speakers. In order to address these problems, we propose to consider readability in text selection. The experiment shows that the selected scripts with the proposed method have good unit coverage of the language and good readability.
Index Terms: Text-to-speech, recording scripts, text selection, text readability
Cite as: Dong, M., Cen, L., Chan, P., Li, H. (2010) Considering readability in text-to-speech recording script design. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 312-316
@inproceedings{dong10_ssw, author={Minghui Dong and Ling Cen and Paul Chan and Haizhou Li}, title={{Considering readability in text-to-speech recording script design}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={312--316} }