5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Multi-Lingual Duration Modeling

Jan van Santen, Chilin Shih, Bernd Möbius, Evelyne Tzoukermann, Michael Tanenblatt

Lucent Technologies - Bell Labs, Murray Hill, NJ, USA

Controlling timing in text-to-speech synthesis systems is complicated, because there are many contextual factors that affect timing; moreover, which factors matter and what their precise effects are varies among languages. We describe here a language-independent approach for duration control. At run time, a language-independent timing module accesses language-specific tables. These tables specify which sub-classes of the feature space (i.e., all combinations of context and phone identity) are homogeneous in the specific sense that the same factors have similar effects on the cases in a sub-class. Within a sub-class, durations are modeled by simple arithmetic models such as multiplicative, additive, or - more generally - sums-of-products models. Exploratory statistical methods (supervised) and parameter estimation techniques (unsupervised) are used for table construction.

