Unit selection systems of speech synthesis offer good overall quality, but this may be countervailed by a sporadic and unpredictable occurrence of audible artifacts, such as discontinuities in F0 and the spectrum. Informal observations suggested that such breaks may have an effect on perceived vowel duration. This study therefore investigates the effect of F0 and formant discontinuities on the perceived duration of vowels in Czech synthetic speech. Ten manipulations of F0, F1 and F2 were performed on target vowels in short synthesized phrases creating abrupt breaks in the contours at the midpoint of the vowels. Listeners decided in a 2AFC task in which phrase the last syllable was longer. The results showed that despite identical duration of the compared stimuli, vowels which were manipulated in the second part towards centralized values (i.e., less peripheral) were systematically considered to be shorter by the listeners than stimuli without such discontinuities, and vice versa. However, the influence seems to be distinct from an overall formant change (without a discontinuity) since a control stimulus in which the manipulation was performed within the entire vowel was not perceived as significantly shorter or longer. No effect of F0 manipulations was observed.
Cite as: Bořil, T., Šturm, P., Skarnitzl, R., Volín, J. (2017) Effect of Formant and F0 Discontinuity on Perceived Vowel Duration: Impacts for Concatenative Speech Synthesis. Proc. Interspeech 2017, 2998-3002, doi: 10.21437/Interspeech.2017-1161
@inproceedings{boril17_interspeech, author={Tomáš Bořil and Pavel Šturm and Radek Skarnitzl and Jan Volín}, title={{Effect of Formant and F0 Discontinuity on Perceived Vowel Duration: Impacts for Concatenative Speech Synthesis}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2998--3002}, doi={10.21437/Interspeech.2017-1161} }