In this study, we compared three long-term fundamental frequency estimates — mean, median and base value — with respect to how fast they approach a stable value, as a function of language, speaking style and speaker. The base value concept was developed in search for an f0 value which should be invariant under prosodic variation. It has since also been tested in forensic phonetics as a possible speaker-specific f0 value. Data used in this study — recorded speech by male and female speakers in seven languages and three speaking styles, spontaneous, phrase reading and word list reading — had been recorded for a previous project. Average stabilisation times for the mean, median and base value are 9.76, 9.67 and 8.01 s. Base values stabilise significantly faster. Languages differ in both average and variability of the stabilisation times. Values range from 7.14 to 11.41 (mean), 7.5 to 11.33 (median) and 6.74 to 9.34 (base value). Spontaneous speech yields the most variable stabilisation times for the three estimators in Italian and Swedish, for the median in French and Portuguese and base value in German. Speakers within each language do not differ significantly in terms of stabilisation time variability for the three estimators.
Cite as: Arantes, P., Eriksson, A., Gutzeit, S. (2017) Effect of Language, Speaking Style and Speaker on Long-Term F0 Estimation. Proc. Interspeech 2017, 3897-3901, doi: 10.21437/Interspeech.2017-449
@inproceedings{arantes17_interspeech, author={Pablo Arantes and Anders Eriksson and Suska Gutzeit}, title={{Effect of Language, Speaking Style and Speaker on Long-Term F0 Estimation}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={3897--3901}, doi={10.21437/Interspeech.2017-449} }