Summarizing source code through heterogeneous feature fusion and extraction

INFORMATION FUSION(2024)

引用 0|浏览2
暂无评分
摘要
Code summarization, which seeks to automatically produce a succinct natural-language description to summarize the functionality of source code, plays an essential role in maintaining the software. Currently, plentiful approaches have been proposed to first encode the source code based on its Abstract Syntax Tree (AST), and then decode it into a textual summary. However, most existing works interpret the AST-based syntax structure as a homogeneous graph, without discriminating the different relations between graph nodes (e.g., the parent-child and sibling relations) in a heterogeneous way. To mitigate this issue, this paper proposes HetCoS to extract the syntactic and sequential features of source code by exploring its inherent heterogeneity for code summarization. Specifically, we first build a Heterogeneous Code Graph (HCG) that fuses the syntax structure and code sequence with eight types of edges/relations designed between graph nodes. Moreover, we present a heterogeneous graph neural network for capturing the diverse relations in HCG. The represented HCG is then fed into a Transformer decoder, followed by a multi-head attention-based copying mechanism to support high-quality summary generation. Extensive experiments on the major Java and Python datasets illustrate the superiority of our approach over sixteen state-of-the-art baselines. To promote reproducibility studies, we make the implementation of HetCoS publicly accessible at https://github.com/GJCEXP/HETCOS.
更多
查看译文
关键词
Code summarization,Feature fusion,Heterogeneous graph,Graph neural network,Transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要