Aligners: Decoupling LLMs and Alignment
arxiv(2024)
摘要
Large Language Models (LLMs) need to be aligned with human expectations to
ensure their safety and utility in most applications. Alignment is challenging,
costly, and needs to be repeated for every LLM and alignment criterion. We
propose to decouple LLMs and alignment by training aligner models that can be
used to align any LLM for a given criteria on an as-needed basis, thus also
reducing the potential negative impacts of alignment on performance. Our recipe
for training the aligner models solely relies on synthetic data generated with
a (prompted) LLM and can be easily adjusted for a variety of alignment
criteria. We illustrate our method by training an "ethical" aligner and verify
its efficacy empirically.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要