The Landscape of Source Code Representation Learning in AI-Driven Software Engineering Tasks.

ICSE Companion(2023)

引用 0|浏览11
暂无评分
摘要
Appropriate representation of source code and its relevant properties form the backbone of Artificial Intelligence (AI)/ Machine Learning (ML) pipelines for various software engineering (SE) tasks such as code classification, bug prediction, code clone detection, and code summarization. In the literature, researchers have extensively experimented with different kinds of source code representations (syntactic, semantic, integrated, customized) and ML techniques such as pre-trained BERT models. In addition, it is common for researchers to create hand-crafted and customized source code representations for an appropriate SE task. In a 2018 survey [1], Allamanis et al. listed nearly 35 different ways of of representing source code for different SE tasks like Abstract Syntax Trees (ASTs), customized ASTs, Control Flow Graphs (CFGs), Data Flow Graphs (DFGs) and so on. The main goal of this tutorial is two-fold (i) Present an overview of the state-of-the-art of source code representations and corresponding ML pipelines with an explicit focus on the merits and demerits of each of the representations (ii) Practical challenges in infusing different code-views in the state-of-the-art ML models and future research directions.
更多
查看译文
关键词
Code Representation,Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要