An improved version of the Slovene text-to-speech system S5 is described. S5 can be used either as a stand-alone reading system or it can be integrated into other applications. S5 is based on concatenation of basic speech units, diphones, using the TD-PSOLA technique. The input text is transformed into its spoken equivalent by a series of modules. F0 modeling is based primarily on predicting the appropriate tonemic accent. Phone duration is predicted by a two level approach, taking into account how acceleration or slowing down applies to the duration of individual phones. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union (ITU).
Cite as: Pavesic, N., Gros, J. (1999) S5: the SQEL slovene speech synthesis system. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2103-2106, doi: 10.21437/Eurospeech.1999-467
@inproceedings{pavesic99_eurospeech, author={N. Pavesic and Jerneja Gros}, title={{S5: the SQEL slovene speech synthesis system}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2103--2106}, doi={10.21437/Eurospeech.1999-467} }