Assessing the Interpretability of Programmatic Policies with Large Language Models
CoRR(2023)
摘要
Although the synthesis of programs encoding policies often carries the
promise of interpretability, systematic evaluations were never performed to
assess the interpretability of these policies, likely because of the complexity
of such an evaluation. In this paper, we introduce a novel metric that uses
large-language models (LLM) to assess the interpretability of programmatic
policies. For our metric, an LLM is given both a program and a description of
its associated programming language. The LLM then formulates a natural language
explanation of the program. This explanation is subsequently fed into a second
LLM, which tries to reconstruct the program from the natural-language
explanation. Our metric then measures the behavioral similarity between the
reconstructed program and the original. We validate our approach with
synthesized and human-crafted programmatic policies for playing a real-time
strategy game, comparing the interpretability scores of these programmatic
policies to obfuscated versions of the same programs. Our LLM-based
interpretability score consistently ranks less interpretable programs lower and
more interpretable ones higher. These findings suggest that our metric could
serve as a reliable and inexpensive tool for evaluating the interpretability of
programmatic policies.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要