Multi-Modal Code Summarization with Retrieved Summary

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)(2022)

引用 0|浏览28
暂无评分
摘要
A high-quality code summary describes the functionality and purpose of a code snippet concisely, which is key to program comprehension. Automatic code summarization aims to generate natural language summaries from code snippets automatically, which can save developers time and improve efficiency in development and maintenance. Recently, researchers mainly use neural machine translation (NMT) based approaches to fill this task. They apply a neural model to translate code snippets into natural language summaries. However, the performance of existing NMT-based approaches is limited. Although a summary and a code snippet are semantically related, they may not share common lexical tokens or language structures. Such a semantic gap between codes and summaries hinders the effect of NMT-based models. Only using code tokens to represent a code snippet cannot help NMT-based models overcome this gap. To solve this problem, in this paper, we propose a code summarization approach that incorporates lexical, syntactic and semantic modalities of codes. We treat code tokens as the lexical modality and the abstract syntax tree (AST) as the syntactic modality. To obtain the semantic modality, inspired by translation memory (TM) in NMT, we use the information retrieval (IR) technique to retrieve a relevant summary for a code snippet to describe its functionality. We propose a novel approach based on contrastive learning to build a retrieval model to retrieve semantically similar summaries. Our approach learns and fuses those different modalities using Transformer. We evaluate our approach on a large Java dataset, experiment results show that our approach outperforms the state-of-the-art approaches on automatic evaluation metrics BLEU, ROUGE and METEOR by 10%, 8% and 9%.
更多
查看译文
关键词
program comprehension,code summarization,information retrieval,neural machine translation,contrastive learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要