ISCA Archive SpeechProsody 2022
ISCA Archive SpeechProsody 2022

Investigating the usefulness of i-vectors for automatic language characterization

Maureen de Seyssel, Guillaume Wisniewski, Emmanuel Dupoux, Bogdan Ludusan

Work done in recent years has shown the usefulness of using automatic methods for the study of linguistic typology. However, the majority of proposed approaches come from natural language processing and require expert knowledge to predict typological information for new languages. An alternative would be to use speech-based methods that do not need extensive linguistic annotations, but considerably less work has been done in this direction. The current study aims to reduce this gap, by investigating a promising speech representation, i-vectors, which by capturing suprasegmental features of language, can be used for the automatic characterization of languages. Employing data from 24 languages, covering several linguistic families, we computed the i-vectors corresponding to each sentence and we represented the languages by their centroid i-vector. Analyzing the distance between the language centroids and phonological, inventory and syntactic distances between the same languages, we observed a significant correlation between the i-vector distance and the syntactic distance. Then, we explored in more detailed a number of syntactic features and we proposed a method for predicting the value of the most promising feature, based on the i-vector information. The obtained results, an 87% classification accuracy, are encouraging and we envision to extend this method further.


doi: 10.21437/SpeechProsody.2022-94

Cite as: Seyssel, M.d., Wisniewski, G., Dupoux, E., Ludusan, B. (2022) Investigating the usefulness of i-vectors for automatic language characterization. Proc. Speech Prosody 2022, 460-464, doi: 10.21437/SpeechProsody.2022-94

@inproceedings{seyssel22_speechprosody,
  author={Maureen de Seyssel and Guillaume Wisniewski and Emmanuel Dupoux and Bogdan Ludusan},
  title={{ Investigating the usefulness of i-vectors for automatic language characterization}},
  year=2022,
  booktitle={Proc. Speech Prosody 2022},
  pages={460--464},
  doi={10.21437/SpeechProsody.2022-94}
}