Source code author identification with unsupervised feature learning

Pattern Recognition Letters(2013)

引用 27|浏览0
暂无评分
摘要
Automatic identification of source code authors has many applications in different fields such as source code plagiarism detection, and law suit prosecution. This paper presents a new source code author identification system based on an unsupervised feature learning technique. As a method of extracting features from high dimensional data, unsupervised feature learning has obtained a great success in many fields such as character recognition and image classification. However, according to our knowledge it has not been applied for source code author identification systems. Therefore, we investigated an unsupervised feature learning technique called sparse auto-encoder as a method of extracting features from source code files. Our system was evaluated with several datasets and results have shown that performance is very close to the state of art techniques in the source code identification field.
更多
查看译文
关键词
source code identification field,source code author,unsupervised feature learning,identification system,automatic identification,new source code author,source code plagiarism detection,source code author identification,unsupervised feature,source code file,auto encoder,logistic regression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要