Landmark-based pronunciation error identification on L2 Mandarin Chinese

Xuesong Yang, Xiang Kong, Mark Hasegawa-Johnson, Yanlu Xie

This paper explores a novel approach of identifying pronunciation errors for the second language (L2) learners based on the landmark theory of human speech perception. Earlier works on the selection method of distinctive features and the likelihood-based "goodness of pronunciationÂ’Â’ (GOP) measurement have gained progress in several L2 languages, e.g. Dutch and English. However, the improvement of performance is limited due to error-prone automatic speech recognition (ASR) systems and less distinguishable features. Landmark theory that exploits quantal nonlinear relationships of articulatory-acoustics provides a basis of selecting distinctive feature positions that are suitable for identifying pronunciation errors. By leveraging this English acoustic landmark theory, we propose to select Mandarin Chinese salient phonetic landmarks for top-16 frequently mispronounced phonemes by Japanese (L1) learners, and extract corresponding features including mel-frequency cepstral coefficients (MFCC) and formants. Both tasks of cross validation and evaluation are performed for individual phoneme using support vector machine with linear kernel (LinearSVM). Experiments illustrate that our landmark-based approaches achieve higher kappa and f1 score significantly than GOP-based methods that calculate duration normalized confidence score for each phoneme.

DOI: 10.21437/SpeechProsody.2016-51

Cite as

Yang, X., Kong, X., Hasegawa-Johnson, M., Xie, Y. (2016) Landmark-based pronunciation error identification on L2 Mandarin Chinese. Proc. Speech Prosody 2016, 247-251.

author={Xuesong Yang and Xiang Kong and Mark Hasegawa-Johnson and Yanlu Xie},
title={Landmark-based pronunciation error identification on L2 Mandarin Chinese},
booktitle={Speech Prosody 2016},