Formalization Quality in Isabelle.

Fabian Huch, Yiannos Stathopoulos

CICM(2023)

引用 0|浏览6
暂无评分
摘要
Little is known about the quality of formalizations in interactive theorem proving. In this work, we analyze the relationship between static analysis warnings (lints) and maintenance effort for 6470 Isabelle theories, create models to predict lints based on structural features, and compare the results to a small ground-truth dataset collected with the help of domain experts. We find that for the majority of lints, there is a significant but low-strength correlation between frequency of occurrence and churn in maintenance change-sets. In particular, for proofs using tactic methods (which can be brittle), the Spearman correlation is highest with a strength of 0.16, p < 0.005 . Furthermore, when classifying theories as lint-prone based on their formal entity graphs (which capture the dependencies between underlying logical entities), random forests outperform even deep learning models on our data, achieving 58 % precision and 21 % recall. Finally, in our ground-truth dataset of 35 good and 35 problematic theories, our pre-defined criterion that identifies theories with more than one lint every 100 lines achieves 95 % precision and 51 % recall. Remarkably, this is very close to the optimal criterion, which we observe at one lint every 109 lines. Moreover, the random forest model trained for lint-proneness even achieves perfect accuracy at 43 % recall, providing additional evidence of its effectiveness.
更多
查看译文
关键词
formalization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要