Data driven models suffer from data sparsity and can be difficult to generalise. Rule based models suffer from being over prescriptive and insensitive to the contents of the unit selection database. To further complicate matters the space of acceptable prosody for any one utterance is large. However in some cases prosodic patterns for a particular speaker can be very homogeneous, for example the prosodic pattern used to read out a zip code. In this paper we describe a method for exploring and analysing the prosodic space within a limited domain, and a method for merging a simple rule based prosodic model with a set of data driven mini prosodic models. A listening test was carried out on the synthesis of zip codes with and without the mini models with promising results. The approach could be applied effectively to domains varying from numerical amounts to personal names.
Cite as: Aylett, M. (2004) Merging data driven and rule based prosodic models for unit selection TTS. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 55-60
@inproceedings{aylett04_ssw, author={Matthew Aylett}, title={{Merging data driven and rule based prosodic models for unit selection TTS}}, year=2004, booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)}, pages={55--60} }