We propose a multilingual personality classifier that uses text data from social media and Youtube Vlog transcriptions, and maps them into Big Five personality traits using a Convolutional Neural Network (CNN). We first train unsupervised bilingual word embeddings from an English-Chinese parallel corpus, and use these trained word representations as input to our CNN. This enables our model to yield relatively high cross-lingual and multilingual performance on Chinese texts, after training on the English dataset for example. We also train monolingual Chinese embeddings from a large Chinese text corpus and then train our CNN model on a Chinese dataset consisting of conversational dialogue labeled with personality. We achieve an average F-score of 66.1 in our multilingual task compared to 63.3 F-score in cross-lingual, and 63.2 F-score in the monolingual performance.
Cite as: Siddique, F.B., Fung, P. (2017) Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets. Proc. Interspeech 2017, 3271-3275, doi: 10.21437/Interspeech.2017-1379
@inproceedings{siddique17_interspeech, author={Farhad Bin Siddique and Pascale Fung}, title={{Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={3271--3275}, doi={10.21437/Interspeech.2017-1379} }