Improving Domain Adaptation through Extended-Text Reading Comprehension
CoRR(2024)
摘要
To enhance the domain-specific capabilities of large language models,
continued pre-training on a domain-specific corpus is a prevalent method.
Recent work demonstrates that adapting models using reading comprehension data
formatted by regex-based patterns can significantly improve performance on
domain-specific tasks. However, regex-based patterns are incapable of parsing
raw corpora using domain-specific knowledge. Furthermore, the question and
answer pairs are extracted directly from the corpus in predefined formats
offers limited context. To address this limitation, we improve reading
comprehension via LLM and clustering. LLM focuses on leveraging domain
knowledge within the corpus to refine comprehension stage, while clustering
supplies relevant knowledge by extending the context to enrich reading stage.
Additionally, our method incorporates parameter-efficient fine-tuning to
improve the efficiency of domain adaptation. In comparison to AdaptLLM, our
method achieves an improvement exceeding 5
will available at https://github.com/microsoft/LMOps.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要