Automatic Assessment of L2 English Word Prosody Using Weighted Distances of F0 and Intensity Contours

Quy-Thao Truong, Tsuneo Kato, Seiichi Yamamoto


In the current paper, an automatic prosody assessment method for learners of English using a weighted comparison of fundamental frequency (F0) and intensity contours is proposed. Patterns of F0 and intensity of learners are compared to that of native using a proposed metric - a weighted distance - in which the error around the high values of prosodic features have more weight in the computation of the final distance. Gold-standard native references are built using the k-means clustering algorithm. Therefore, we also propose a data-driven criterion called weighted variance based on the weighted similarity within the whole set of native utterances to determine the optimal number of clusters k. In comparison with baseline contour comparison metrics which resulted in a subjective-objective score correlation of 0.278, our method combining the proposed metric and criterion led to a final subjective-objective score correlation of 0.304. In comparison, subjective scores correlated at 0.480.


 DOI: 10.21437/Interspeech.2018-1386

Cite as: Truong, Q., Kato, T., Yamamoto, S. (2018) Automatic Assessment of L2 English Word Prosody Using Weighted Distances of F0 and Intensity Contours. Proc. Interspeech 2018, 2186-2190, DOI: 10.21437/Interspeech.2018-1386.


@inproceedings{Truong2018,
  author={Quy-Thao Truong and Tsuneo Kato and Seiichi Yamamoto},
  title={Automatic Assessment of L2 English Word Prosody Using Weighted Distances of F0 and Intensity Contours},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2186--2190},
  doi={10.21437/Interspeech.2018-1386},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1386}
}