The Alignment Problem from a Deep Learning Perspective: A Position Paper

ICLR 2024(2024)

引用 0|浏览1
暂无评分
摘要
AI systems based on deep learning have reached or surpassed human performance in a range of narrow domains. In coming decades, artificial general intelligence (AGI) may surpass human capabilities at many critical tasks. In this position paper, we examine the technical difficulty of fine-tuning hypothetical AGI systems based on pretrained deep models to pursue goals that are aligned with human interests. We argue that, if trained like today's most capable models, AGI systems could learn to act deceptively to receive higher reward, learn internally-represented goals which generalize beyond their fine-tuning distributions, and pursue those goals using power-seeking strategies. We review emerging evidence for these properties. AGIs with these properties would be difficult to align and may appear aligned even when they are not.
更多
查看译文
关键词
Alignment,Safety,AGI,position paper
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要