Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss

CoRR(2023)

引用 0|浏览8
暂无评分
摘要
Practitioners commonly align large language models using pairwise preferences, i.e., given labels of the type response A is preferred to response B for a given input. Perhaps less commonly, methods have also been developed for binary feedback, i.e. training models given labels of type response A is good or bad. We show how an existing performant binary feedback method, the Cringe Loss (Adolphs et al., 2022), can be generalized to the pairwise preference setting using a simple soft margin extension. Pairwise Cringe Loss is straightforward to implement and efficient to train, and we find it outperforms state-of-the-art preference optimization algorithms such as PPO and DPO on the AlpacaFarm benchmark.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络