Bayesian Preference Elicitation with Language Models
arxiv(2024)
摘要
Aligning AI systems to users' interests requires understanding and
incorporating humans' complex values and preferences. Recently, language models
(LMs) have been used to gather information about the preferences of human
users. This preference data can be used to fine-tune or guide other LMs and/or
AI systems. However, LMs have been shown to struggle with crucial aspects of
preference learning: quantifying uncertainty, modeling human mental states, and
asking informative questions. These challenges have been addressed in other
areas of machine learning, such as Bayesian Optimal Experimental Design (BOED),
which focus on designing informative queries within a well-defined feature
space. But these methods, in turn, are difficult to scale and apply to
real-world problems where simply identifying the relevant features can be
difficult. We introduce OPEN (Optimal Preference Elicitation with Natural
language) a framework that uses BOED to guide the choice of informative
questions and an LM to extract features and translate abstract BOED queries
into natural language questions. By combining the flexibility of LMs with the
rigor of BOED, OPEN can optimize the informativity of queries while remaining
adaptable to real-world domains. In user studies, we find that OPEN outperforms
existing LM- and BOED-based methods for preference elicitation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要