Neural Detection of Cross-lingual Syntactic Knowledge

Yongjian Chen, Mireia Farrús

In recent years, there has been prominent development in pretrained multilingual language models, such as mBERT, XLMR, etc., which are able to capture and learn linguistic knowledge from input across a variety of languages simultaneously. However, little is known about where multilingual models localise what they have learnt across languages. In this paper, we specifically evaluate cross-lingual syntactic information embedded in CINO, a more recent multilingual pre-trained language model. We probe CINO on Universal Dependencies treebank datasets of English and Chinese Mandarin for two syntax-related layerwise evaluation tasks: Part-of-Speech Tagging at token level and Syntax Tree-depth Prediction at sentence level. The results of our layer-wise probing experiments show that token-level syntax is localisable in higher layers and consistency is shown across the typologically different languages, whereas sentencelevel syntax is distributed across the layers in typology-specific and universal manners.

doi: 10.21437/IberSPEECH.2022-31

Cite as: Chen, Y., Farrús, M. (2022) Neural Detection of Cross-lingual Syntactic Knowledge . Proc. IberSPEECH 2022, 151-155, doi: 10.21437/IberSPEECH.2022-31

