Automatic speaker nativeness assessment has multiple applications, such as second language learning and IVR systems. In this paper we view this as a regression problem, since the available labels are on a continuous scale. Multiple approaches were applied, such as phonotactic models, i-vectors, and goodness of pronunciation, covering both segmental and suprasegmental features. Different phonotactic models were adopted, either trained with the challenge data, or using additional multilingual data from other domains. The obtained values were later combined in multiple ways and fed to a support vector machine regressor. Results on the test set surpass the provided baseline and are in line with the results obtained on the remaining sets. This suggests that our models generalize well to other datasets.
Cite as: Ribeiro, E., Ferreira, J., Olcoz, J., Abad, A., Moniz, H., Batista, F., Trancoso, I. (2015) Combining multiple approaches to predict the degree of nativeness. Proc. Interspeech 2015, 488-492, doi: 10.21437/Interspeech.2015-181
@inproceedings{ribeiro15_interspeech, author={Eugénio Ribeiro and Jaime Ferreira and Julia Olcoz and Alberto Abad and Helena Moniz and Fernando Batista and Isabel Trancoso}, title={{Combining multiple approaches to predict the degree of nativeness}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={488--492}, doi={10.21437/Interspeech.2015-181} }