Classification of Programming Problems based on Topic Modeling

Proceedings of the 2019 7th International Conference on Information and Education Technology(2019)

引用 13|浏览23
暂无评分
摘要
Programming skill is one of the most important and demanding skill in the current generation. In order to enable learners and programmers to practice programming and gain problem-solving skills, many Online Judge (OJ) systems exist. Most of these OJ systems have to be operated solely by students and learners. These students and novice programmers sometimes compete against each other or solve the programming problems by themselves in offline mode. But, most OJ systems have their problems arranged simply into volumes and various contests events. This arrangement system does not have any clear indication of the difficulties and categories of problems. Thus, in this paper, we have studied reliable techniques on the extraction of keywords and features which can categorize these OJ system's programming problems into their respective types and skills. We have leveraged two popular topic modeling algorithms, Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) to extract relevant features. Afterward, six classifiers were trained on these topic modeling features and Naive TF-IDF features. From our studies, we discovered that topic modeling features were relatively smaller in dimensionality, yet matched the performance when trained on high dimensional naive TF-IDF features. Our main goal was to understand the precise trade-off between accuracy and dimensionality of the textual data of programming problem statements. This experiment has enabled us to obtain important tags, hint, and classification of Online Judge programming problems.
更多
查看译文
关键词
Online Judge Systems, feature extraction, novice programmer, text classification, topic modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要