ISCA Archive SSW 2004
ISCA Archive SSW 2004

F0 modeling with multi-layer additive modeling based on a statistical learning technique

Shinsuke Sakai

In this paper, we describe research in fundamental frequency modeling based on a statistical learning technique called additive models. A two-layer additive F0 model consists of a long-term, intonational phrase-level component, and a short-term, accentual phrase-level component. It can be learned from the data using a backfitting algorithm, an optimizer of a penalized leastsquare criterion defined on the model. It estimates two components simultaneously by iteratively applying cubic spline smoothers. To investigate the further flexibility of the model, we incorporated a third additive term that represents a contextual effect on an accentual phrase, and confirmed the improvements in terms of RMS errors. Experimental results on a 7,000 utterance Japanese speech corpus shows an achievement of F0 RMS errors of 28.5 and 29.3 Hz on the training and test data, respectively, with corresponding correlation coefficients of 0.81 and 0.79.


Cite as: Sakai, S. (2004) F0 modeling with multi-layer additive modeling based on a statistical learning technique. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 151-154

@inproceedings{sakai04_ssw,
  author={Shinsuke Sakai},
  title={{F0 modeling with multi-layer additive modeling based on a statistical learning technique}},
  year=2004,
  booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)},
  pages={151--154}
}