SEBGM: Sentence embedding based on generation model with multi-task learning

Computer Speech & Language(2024)

引用 0|浏览2
暂无评分
摘要
Sentence embedding, which aims to learn an effective representation of a sentence, is a significant part for downstream tasks. Recently, using contrastive learning and pre-trained model, most methods of sentence embedding achieve encouraging results. However, on the one hand, these methods utilize discrete data augmentation to obtain positive samples performing contrastive learning, which could distort the original semantic of sentences. On the other hand, most methods directly employ the contrastive frameworks of computer vision to perform contrastive learning, which could confine the contrastive training due to the discrete and sparse text data compared with image data. To solve the issues above, we design a novel contrastive framework based on generation model with multi-task learning by supervised contrastive training on the dataset of natural language inference (NLI) to obtain meaningful sentence embedding (SEBGM). SEBGM makes use of multi-task learning to enhance the usage of word-level and sentence-level semantic information of samples. In this way, the positive samples of SEBGM are from NLI rather than data augmentation. Extensive experiments show that our proposed SEBGM can advance the state-of-the-art sentence embedding on the semantic textual similarity (STS) tasks by utilizing multi-task learning.
更多
查看译文
关键词
Sentence embedding,Contrastive learning,Multi-task learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要