5th International Conference on Spoken Language Processing
Accurate duration modeling is necessary for synthetic speech to sound natural. Over the past few years, the sums-of-products framework has emerged as an effective way to account for contextual influences on phoneme duration. This approach is generally applied after log-transforming the durations. This paper presents empirical and theoretical evidence which suggests that this transformation is not optimal. A promising alternative solution is proposed, based on a root sinusoidal function. Preliminary experimental results were obtained on over 50,000 phonemes in varied prosodic contexts. Compared to the log transformation, this new transformation reduced the proportion of standard deviation unexplained by approximately 30%. Alternatively, for a given level of performance, the root sinusoidal transformation roughly halved the number of regression parameters required.
Bibliographic reference. Bellegarda, Jerome R. / Silverman, Kim E. A. (1998): "Improved duration modeling of English phonemes using a root sinusoidal transformation", In ICSLP-1998, paper 0135.