DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment
arxiv(2024)
摘要
Recent research demonstrates the effectiveness of using pre-trained language
models for legal case retrieval. Most of the existing works focus on improving
the representation ability for the contextualized embedding of the [CLS] token
and calculate relevance using textual semantic similarity. However, in the
legal domain, textual semantic similarity does not always imply that the cases
are relevant enough. Instead, relevance in legal cases primarily depends on the
similarity of key facts that impact the final judgment. Without proper
treatments, the discriminative ability of learned representations could be
limited since legal cases are lengthy and contain numerous non-key facts. To
this end, we introduce DELTA, a discriminative model designed for legal case
retrieval. The basic idea involves pinpointing key facts in legal cases and
pulling the contextualized embedding of the [CLS] token closer to the key facts
while pushing away from the non-key facts, which can warm up the case embedding
space in an unsupervised manner. To be specific, this study brings the word
alignment mechanism to the contextual masked auto-encoder. First, we leverage
shallow decoders to create information bottlenecks, aiming to enhance the
representation ability. Second, we employ the deep decoder to enable
translation between different structures, with the goal of pinpointing key
facts to enhance discriminative ability. Comprehensive experiments conducted on
publicly available legal benchmarks show that our approach can outperform
existing state-of-the-art methods in legal case retrieval. It provides a new
perspective on the in-depth understanding and processing of legal case
documents.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要