EUROSPEECH 2003 - INTERSPEECH 2003
The simulation of speech by means of speech synthesis involves, among other things, the ability to mimic typical delivery for different speech styles. This requires a realistic imitation of the manner in which speakers organize their information flow in time (i.e., word grouping boundaries), as well their speech rate with its variations. The originality of our model is grounded in two levels. First, it is assumed that the temporal component plays a dominant role in the simulation of speech rhythm, whereas in traditional language models, temporal issues are mostly put aside. Second, the outcome of our temporal modeling, based on statistical analysis and qualitative parameters, results from the harmonization of various layers (segmental, syllabic, phrasal). The benefit of a multidimensional model is the possibility of imposing subtle quantitative and qualitative effects at various levels, which is a key for respecting a specific language system as well as speech coherence and fluency for different speech styles.
Bibliographic reference. Keller, Brigitte Zellner (2003): "The temporal organisation of speech as gauged by speech synthesis", In EUROSPEECH-2003, 2569-2572.