ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Deep Feature Transfer Learning for Automatic Pronunciation Assessment

Binghuai Lin, Liyuan Wang

Automatic pronunciation assessment is commonly developed to evaluate pronunciation quality of second language (L2) learners. Traditional methods for automatic pronunciation assessment normally utilize speech features such as Goodness of pronunciation (GOP), which may not provide sufficient information for the pronunciation proficiency assessment [1]. In this paper, we propose a transfer learning method for automatic pronunciation assessment. We directly utilize the deep features from the acoustic model instead of traditional features such as GOP, and transfer the acoustic knowledge from ASR to a specific scoring module. The scoring module is designed to consider the relationship among different granularities in an utterance based on an attention mechanism. Only this module is updated for faster transfer and adaptation of various pronunciation assessment tasks. Experimental results based on the dataset recorded by Chinese English-as-second-language (ESL) learners and the Speechocean762 dataset demonstrate that the proposed method outperforms the traditional GOP-based baselines in Pearson correlation coefficient (PCC) and yields parameter-efficient transfer for different pronunciation assessment tasks.

doi: 10.21437/Interspeech.2021-931

Cite as: Lin, B., Wang, L. (2021) Deep Feature Transfer Learning for Automatic Pronunciation Assessment. Proc. Interspeech 2021, 4438-4442, doi: 10.21437/Interspeech.2021-931

  author={Binghuai Lin and Liyuan Wang},
  title={{Deep Feature Transfer Learning for Automatic Pronunciation Assessment}},
  booktitle={Proc. Interspeech 2021},