This paper presents a new analytic method that can be used for analysing perceptual relevance of unit selection costs and/or their sub-components as well as for tuning of the unit selection weights. The proposed method is leveraged to investigate the behaviour of a unit selection based system. The outcome is applied in a simple experiment with the aim to improve speech output quality of the system by setting limits on the costs and their sub-components during the search for optimal sequences of units. The experiments reveal that a large number (36.17 %) of artifacts annotated by listeners are not reflected by the values of the costs and their sub-componets as currently implemented and tuned in the evaluated system.
Index Terms: speech synthesis, unit selection, concatenation cost, target cost, audible artifacts
Cite as: Matoušek, J., Tihelka, D., Legát, M. (2013) Is unit selection aware of audible artifacts? Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 267-271
@inproceedings{matousek13_ssw, author={Jindřich Matoušek and Daniel Tihelka and Milan Legát}, title={{Is unit selection aware of audible artifacts?}}, year=2013, booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)}, pages={267--271} }