Sixth European Conference on Speech Communication and Technology
An improved version of the Slovene text-to-speech system S5 is described. S5 can be used either as a stand-alone reading system or it can be integrated into other applications. S5 is based on concatenation of basic speech units, diphones, using the TD-PSOLA technique. The input text is transformed into its spoken equivalent by a series of modules. F0 modeling is based primarily on predicting the appropriate tonemic accent. Phone duration is predicted by a two level approach, taking into account how acceleration or slowing down applies to the duration of individual phones. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union (ITU).
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Pavesic, N. / Gros, Jerneja (1999): "S5: the SQEL slovene speech synthesis system", In EUROSPEECH'99, 2103-2106.