ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Totally data-driven intonation prediction model using a novel F0 contour parametric representation

Lifu Yi, Jian Li, Xiaoyan Lou, Jie Hao

This paper proposes a novel parametric representation of mandarin intonation based on orthogonal polynomial approximation. The polynomial is a simplified representation of Parallel Encoding and Target Approximation (PENTA) intonation model that includes a target component and an approximation component. We also propose predicting the polynomial parameters from linguistic and phonetic attributes by generalized linear models (GLM). The optimal attributes are automatically selected by stepwise regression method. Thus both model structures and model coefficients are optimized in a totally data-driven manner. In addition, speaking rate is introduced as a new attribute for prediction. When the method is applied to intonation prediction of Mandarin speech, it achieves F0 RMSE of 30.21 Hz and correlation coefficients of 0.85 in open test. Informal perceptual experiments show that the predicted intonation is quite appropriate and natural.


doi: 10.21437/Interspeech.2006-110

Cite as: Yi, L., Li, J., Lou, X., Hao, J. (2006) Totally data-driven intonation prediction model using a novel F0 contour parametric representation. Proc. Interspeech 2006, paper 1465-Mon2A3O.5, doi: 10.21437/Interspeech.2006-110

@inproceedings{yi06_interspeech,
  author={Lifu Yi and Jian Li and Xiaoyan Lou and Jie Hao},
  title={{Totally data-driven intonation prediction model using a novel F0 contour parametric representation}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1465-Mon2A3O.5},
  doi={10.21437/Interspeech.2006-110}
}