Pix2Code: Learning to Compose Neural Visual Concepts as Programs
CoRR(2024)
摘要
The challenge in learning abstract concepts from images in an unsupervised
fashion lies in the required integration of visual perception and generalizable
relational reasoning. Moreover, the unsupervised nature of this task makes it
necessary for human users to be able to understand a model's learnt concepts
and potentially revise false behaviours. To tackle both the generalizability
and interpretability constraints of visual concept learning, we propose
Pix2Code, a framework that extends program synthesis to visual relational
reasoning by utilizing the abilities of both explicit, compositional symbolic
and implicit neural representations. This is achieved by retrieving object
representations from images and synthesizing relational concepts as
lambda-calculus programs. We evaluate the diverse properties of Pix2Code on the
challenging reasoning domains, Kandinsky Patterns and CURI, thereby testing its
ability to identify compositional visual concepts that generalize to novel data
and concept configurations. Particularly, in stark contrast to neural
approaches, we show that Pix2Code's representations remain human interpretable
and can be easily revised for improved performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要