Automatic Pronunciation Evaluation of Singing

Chitralekha Gupta, Haizhou Li, Ye Wang


In this work, we develop a strategy to automatically evaluate pronunciation of singing. We apply singing-adapted automatic speech recognizer (ASR) in a two-stage approach for evaluating pronunciation of singing. First, we force-align the lyrics with the sung utterances to obtain the word boundaries. We improve the word boundaries by a novel lexical modification technique. Second, we investigate the performance of the phonetic posteriorgram (PPG) based template independent and dependent methods for scoring the aligned words. To validate the evaluation scheme, we obtain reliable human pronunciation evaluation scores using a crowd-sourcing platform. We show that the automatic evaluation scheme offers quality scores that are close to human judgments.


 DOI: 10.21437/Interspeech.2018-1267

Cite as: Gupta, C., Li, H., Wang, Y. (2018) Automatic Pronunciation Evaluation of Singing. Proc. Interspeech 2018, 1507-1511, DOI: 10.21437/Interspeech.2018-1267.


@inproceedings{Gupta2018,
  author={Chitralekha Gupta and Haizhou Li and Ye Wang},
  title={Automatic Pronunciation Evaluation of Singing},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1507--1511},
  doi={10.21437/Interspeech.2018-1267},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1267}
}