Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING(2023)

引用 0|浏览12
暂无评分
摘要
Code clone detection is an important aspect of software development and maintenance. The extensive research in this domain has helped reduce the complexity and increase the robustness of source code, thereby assisting bug detection tools. However, the majority of the clone detection literature is confined to a single language. With the increasing prevalence of cross-platform applications, functionality replication across multiple languages is common, resulting in code fragments having similar functionality but belonging to different languages. Since such clones are syntactically unrelated, single language clone detection tools are not applicable in their case. In this article, we propose a semi-supervised deep learning-based tool Rubhus, capable of detecting clones across different programming languages. Rubhus uses the control and data flow enriched abstract syntax trees (ASTs) of code fragments to leverage their syntactic and structural information and then applies graph neural networks (GNNs) to extract this information for the task of clone detection. We demonstrate the effectiveness of our proposed system through experiments conducted over datasets consisting of Java, C, and Python programs and evaluate its performance in terms of precision, recall, and F1 score. Our results indicate that Rubhus outperforms the state-of-the-art cross-language clone detection tools.
更多
查看译文
关键词
Codes,Cloning,Syntactics,Semantics,Java,Task analysis,Source coding,Program representation learning,cross-language code clone detection,graph-based neural networks,abstract syntax trees
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要