Concept-based Analysis of Neural Networks via Vision-Language Models

arxiv(2024)

引用 0|浏览2
暂无评分
摘要
Formal analysis of vision-based deep neural networks (DNNs) is highly desirable but it is very challenging due to the difficulty of expressing formal specifications for vision tasks and the lack of efficient verification procedures. In this paper, we propose to leverage emerging multimodal, vision-language, foundation models (VLMs) as a lens through which we can reason about vision models. VLMs have been trained on a large body of images accompanied by their textual description, and are thus implicitly aware of high-level, human-understandable concepts describing the images. We describe a logical specification language _ designed to facilitate writing specifications in terms of these concepts. To define and formally check _ specifications, we leverage a VLM, which provides a means to encode and efficiently check natural-language properties of vision models. We demonstrate our techniques on a ResNet-based classifier trained on the RIVAL-10 dataset leveraging CLIP as the multimodal model.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要