This paper presents a two-step model for the symbolic coding and generation of intonation. First, the F0 curve is reduced to a series of pitch target points that capture the macroprosodic information of the utterance. Target points are then converted into a sequence of labels. Generation is achieved through the reverse steps. The model is language independent and requires no prior training on the data. We discuss the influence of the number of categories on the precision of fit, and show, by an evaluation on a large multilingual corpus (4 hours 20 minutes of speech, 50 speakers, 5 languages) that a model composed of three ascending and three descending categories, plus a category for small or null movements enables a regeneration of ca. 99% of points at less than 2 ST than the original. Given that the model is capable of various improvements, it seems a good candidate for practical applications.
Cite as: Véronis, J., Campione, E. (1998) Towards a reversible symbolic coding of intonation. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0846, doi: 10.21437/ICSLP.1998-164
@inproceedings{veronis98_icslp, author={Jean Véronis and Estelle Campione}, title={{Towards a reversible symbolic coding of intonation}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0846}, doi={10.21437/ICSLP.1998-164} }