Research on Code Plagiarism Detection Model Based on Random Forest and Gradient Boosting Decision Tree

Huang Qiubo, Tang Jingdong,Fang Guozheng

Proceedings of the 2019 International Conference on Data Mining and Machine Learning（2019）

引用 2|浏览2

暂无评分

摘要

This paper studies the Online Judge System for assignments such as programming. Sometimes there are plagiarismsin codes submitted by students[1]. In addition to calculating the similarity degree between the codes, we also extract other features to determine whether there isplagiarismsuspicion of a submitted code or not. By using combination of Random Forest and Gradient Boosting Decision Tree, we also can getitssuspicion level. The model first calculates the similarity degree between the newly submitted code and all submitted codes, and determines plagiarism suspect. For some codes that are difficult to confirm whetherisplagiarismor not, we extract the programming style similarity degree, and the student's submission behavior pattern (such as similar target concentration degree) and other features, to create decision trees such as Random Forestand Gradient Boosting Decision Trees, which can help determine the level of plagiarism suspect. If the level is medium, the teacher will mark the code as plagiarized or not. Finally, the learning model is incrementally trained to improve the accuracy of the model and the classification results. Experiment results show that the accuracy rate can reach 95.9%. As a result, the model can prevent students from plagiarizing while minimizing the workload of the teacher.

查看译文

关键词

Code Plagiarism Detection, Code Similarity Degree, Gradient Boosting Decision Tree, Random Forest

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要