In recent years, there has been prominent development in pretrained multilingual language models, such as mBERT, XLMR, etc., which are able to capture and learn linguistic knowledge from input across a variety of languages simultaneously. However, little is known about where multilingual models localise what they have learnt across languages. In this paper, we specifically evaluate cross-lingual syntactic information embedded in CINO, a more recent multilingual pre-trained language model. We probe CINO on Universal Dependencies treebank datasets of English and Chinese Mandarin for two syntax-related layerwise evaluation tasks: Part-of-Speech Tagging at token level and Syntax Tree-depth Prediction at sentence level. The results of our layer-wise probing experiments show that token-level syntax is localisable in higher layers and consistency is shown across the typologically different languages, whereas sentencelevel syntax is distributed across the layers in typology-specific and universal manners.
Cite as: Chen, Y., Farrús, M. (2022) Neural Detection of Cross-lingual Syntactic Knowledge . Proc. IberSPEECH 2022, 151-155, doi: 10.21437/IberSPEECH.2022-31
@inproceedings{chen22_iberspeech, author={Yongjian Chen and Mireia Farrús}, title={{Neural Detection of Cross-lingual Syntactic Knowledge }}, year=2022, booktitle={Proc. IberSPEECH 2022}, pages={151--155}, doi={10.21437/IberSPEECH.2022-31} }