Concept-based Analysis of Neural Networks via Vision-Language Models
arxiv(2024)
摘要
Formal analysis of vision-based deep neural networks (DNNs) is highly
desirable but it is very challenging due to the difficulty of expressing formal
specifications for vision tasks and the lack of efficient verification
procedures. In this paper, we propose to leverage emerging multimodal,
vision-language, foundation models (VLMs) as a lens through which we can reason
about vision models. VLMs have been trained on a large body of images
accompanied by their textual description, and are thus implicitly aware of
high-level, human-understandable concepts describing the images. We describe a
logical specification language _ designed to
facilitate writing specifications in terms of these concepts. To define and
formally check _ specifications, we leverage a
VLM, which provides a means to encode and efficiently check natural-language
properties of vision models. We demonstrate our techniques on a ResNet-based
classifier trained on the RIVAL-10 dataset leveraging CLIP as the multimodal
model.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要