Danish has a distinctive vowel length opposition which is realized with little differences in vowel qualities. This paper investigates the possibilities of using this fact in reducing the size of the speech unit database in a high quality concatenative based text-to-speech system for Danish. The purpose is to evaluate the concept of using long vowels for synthesizing the corresponding short vowels. If this proves successful the size of the speech unit database may be reduced by approximately 40%. An acoustic analysis of the long and short vowels in the present speech unit database was performed. The results are presented in a F1-F2 plot, and demonstrate a significant overlap between long and short vowels. Consequently, two different strategies for synthesizing the short vowels from their long counterpart were tested. The first strategy used resegmented long vowel and the second relied entirely on the time-scaling technique built into the signal generation module. The two strategies for synthesizing the short vowels were compared to using pre-recorded short vowels in a comprehensive listening test. The results of the listening test were based on 32 subjects judging intelligibility and naturalness. The results show no significant differences between the prerecorded short vowels and the resegmented long vowels synthesized as short vowels. The resegmented long vowels will be implemented in the present text-to-speech system for further testing.
Cite as: Andersen, O., Dyhr, N.-J., Engberg, I.S., Nielsen, C. (1998) Synthesising short vowels from their long counterparts in a concatenative based text-to-speech system. Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3), 165-170
@inproceedings{andersen98_ssw, author={Ove Andersen and N.-J. Dyhr and I. S. Engberg and C. Nielsen}, title={{Synthesising short vowels from their long counterparts in a concatenative based text-to-speech system}}, year=1998, booktitle={Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3)}, pages={165--170} }