Sentence Embeddings and Sentence Similarity for Portuguese FAQs

Nuno Carriço, Paulo Quaresma

Virtual Assistant Bots are becoming essential in business models. This aims to provide customer service without the need of a human operator. Thus, the first step is to understand what a customer needs. To achieve this, we compute the sentence distance between a set of predefined FAQs and the user sentence, and extract the closest FAQ. While the problem has satisfactory results for english, it is not the case for portuguese. Therefore, we propose the use of portuguese BERT models to obtain the sentence embeddings of both the FAQs and user sentence, in order to compute their distances scores. The BERT models are fine tuned with the ASSIN 2 dataset for sentence similarity tasks to achieve better performance. The fine tuned models were evaluated against ASSIN 2 test set. The FAQs embeddings are inserted in a FAISS index, which is used to extract the n closest FAQs embeddings to a user sentence. The index provides an efficient way to maintain the embeddings and search for the closest neighbors given a query data point. Given the set of FAQs, we built sample user questions, labelled with their corresponding FAQ, to test the setup.

doi: 10.21437/IberSPEECH.2021-43

Carriço, N, Quaresma, P (2021) Sentence Embeddings and Sentence Similarity for Portuguese FAQs. Proc. IberSPEECH 2021, 200-204, doi: 10.21437/IberSPEECH.2021-43.