![]() |
Phonetics and Phonology of Speaking Styles: Reduction and Elaboration in Speech CommunicationBarcelona, Catalonia, Spain |
![]() |
This paper describes work done on the prosodic rules of the Edinburgh University CSTR Text-to- Speech system, a linguistically sophisticated speech output system. Like most other high-quality text-to-speech systems, our system bases its prosodic rules on the content-word/function-word (CW/FW) distinction. The first step in determining accentuation is to assign accent to (almost) all CWs, resulting in an unnaturally over-accented representation. It is therefore necessary to apply a RHYTHM RULE to this representation in order to derive a more natural accentuation. There are various stages to our rhythm rule, producing successively more reduced prosody. The first stage produces what we call the fully-accented representation, from which various degrees of reduction can be produced: this stage is obligatory, and all subsequent stages are optional. The choice of which additional stages to apply is dependent upon the speech rate and style to be synthesised, but a range from laboriously stilted to extremely reduced can be produced.
The various stages and their effects are presented, and their relation to pitch and duration in the acoustic output is discussed. Some further work on remaining problems is also suggested.
Bibliographic reference. Monaghan, Alex I. C. (1991): "Accentuation and speech rate in the CSTR TTS system", In PPoSpSt-1991, paper 041.