Automatic Characterisation of the Pronunciation of Non-native English Speakers using Phone Distance Features

Konstantinos Kyriakopoulos, Mark Gales, Kate Knill


The distances between and relative movements of phones in acoustic space in language learners have been shown to be indicative of the speaker's proficiency, in a way that is compact and independent of bias-inducing voice qualities. Typically these features are based on known transcriptions, "read aloud" style tasks. This paper examines the information that can be extracted about speakers from phone distance features (PDFs) when the transcription is unknown. Here, phone distances are obtained by measuring the relative entropy between a distribution trained on the speaker's manner of pronunciation of each of the phones of the English language and distributions trained on each of the other phones. These features are extracted from untranscribed audio and so rely on automatic speech recognition (ASR) output. The ASR can have high word error rates, as spontaneous, non-native speech is being recognised. Two forms of speaker characterisation are examined using these features: first, the use of PDFs to predict the speaker's proficiency and second, their use in classifying the mother tongue (L1) of the speaker. For both tasks, recorded answers to sections of the BULATS English Speaking test were used. Using only PDFs for predicting the grade within a Gaussian Process based grader showed performance comparable to using a range of standard fluency style features. This indicates the robustness of PDFs to errors in the ASR output. Additionally, the same PDF features can detect with high accuracy the L1 of the speakers from among 21 L1s using a deep neural network based classifier. Experiments on South American Spanish show that it is further possible to discriminate between the speakers' countries of origin.


 DOI: 10.21437/SLaTE.2017-11

Cite as: Kyriakopoulos, K., Gales, M., Knill, K. (2017) Automatic Characterisation of the Pronunciation of Non-native English Speakers using Phone Distance Features. Proc. 7th ISCA Workshop on Speech and Language Technology in Education, 59-64, DOI: 10.21437/SLaTE.2017-11.


@inproceedings{Kyriakopoulos2017,
  author={Konstantinos Kyriakopoulos and Mark Gales and Kate Knill},
  title={Automatic Characterisation of the Pronunciation of Non-native English Speakers using Phone Distance Features},
  year=2017,
  booktitle={Proc. 7th ISCA Workshop on Speech and Language Technology in Education},
  pages={59--64},
  doi={10.21437/SLaTE.2017-11},
  url={http://dx.doi.org/10.21437/SLaTE.2017-11}
}