HICL: Hashtag-Driven In-Context Learning for Social Media Natural Language Understanding
arxiv(2023)
摘要
Natural language understanding (NLU) is integral to various social media
applications. However, existing NLU models rely heavily on context for semantic
learning, resulting in compromised performance when faced with short and noisy
social media content. To address this issue, we leverage in-context learning
(ICL), wherein language models learn to make inferences by conditioning on a
handful of demonstrations to enrich the context and propose a novel
hashtag-driven in-context learning (HICL) framework. Concretely, we pre-train a
model #Encoder, which employs #hashtags (user-annotated topic labels) to drive
BERT-based pre-training through contrastive learning. Our objective here is to
enable #Encoder to gain the ability to incorporate topic-related semantic
information, which allows it to retrieve topic-related posts to enrich contexts
and enhance social media NLU with noisy contexts. To further integrate the
retrieved context with the source text, we employ a gradient-based method to
identify trigger terms useful in fusing information from both sources. For
empirical studies, we collected 45M tweets to set up an in-context NLU
benchmark, and the experimental results on seven downstream tasks show that
HICL substantially advances the previous state-of-the-art results. Furthermore,
we conducted extensive analyzes and found that: (1) combining source input with
a top-retrieved post from #Encoder is more effective than using semantically
similar posts; (2) trigger words can largely benefit in merging context from
the source and retrieved posts.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要