September 22-25, 1997
This paper presents a stochastic model of French intonation contours for use in text-to-speech synthesis. The model has two modules, a linguistic module that generates prosodic labels from text, and a phonetic module that generates an F0 curve from the prosodic labels. This model differs from previous work in the prosodic labels used, which can be automatically derived from the training corpus. This feature makes it possible to use large corpora or several corpora of different speech styles, in addition to making it easy to adapt to new languages. The present paper focuses on the linguistic module, which does not require full syntactic analysis of the text but simply relies on a part-of-speech tagging technique. The results were validated by means of a perception test which showed that listeners did not perceive a significant difference in quality between the sentences synthesized with the original F0 curve (from a recording), and those synthesized with the model-generated curve. The proposed model thus appears to capture a large part of the grammatical information needed to generate F0.
Full Paper Acoustic Example #1 Acoustic Example #2
Bibliographic reference. Veronis, Jean / Cristo, Philippe Di / Courtois, Fabienne / Lagrue, Benoit (1997): "A stochastic model of intonation for French text-to-speech synthesis", In EUROSPEECH-1997, 2643-2646.