Sentence Similarity Recognition in Portuguese from Multiple Embedding Models.

ICMLA(2022)

引用 1|浏览0
暂无评分
摘要
Distinct pre-trained embedding models perform differently in sentence similarity recognition tasks. The current assumption is that they encode different features due to differences in algorithm design and characteristics of the datasets employed in the pre-trained process. The perspective of benefiting from different encoded features to generate more suitable representations motivated the assembly of multiple embedding models, so-called meta-embedding. Meta-embedding methods combine different pre-trained embedding models to perform a task. Recently, multiple pre-trained language representations derived from Transformers architecture-based systems have been shown to be effective in many downstream tasks. This paper introduces a supervised meta-embedding neural network to combine contextualized pre-trained models for sentence similarity recognition in Portuguese. Our results show that combining multiple sentence pre-trained embedding models outperforms single models and can be a promising alternative to improve performance sentence similarity. Moreover, we also discuss the results considering our simple extension of a model explainability method to the meta-embedding context, allowing the visual identification of the impact of each token on the sentence similarity score.
更多
查看译文
关键词
sentence similarity,STS,embedding,meta-embedding,Portuguese sentence similarity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要