Reducing the size of training datasets in the classification of online discussions

2021 International Conference on Advanced Learning Technologies (ICALT)(2021)

引用 0|浏览6
暂无评分
摘要
Supervised machine learning models have been widely used to address the classification of messages in online discussions. Supervised learning algorithms require a large set of annotated data to accurately create a predictive model. However, data annotation is a complex task due to three factors: (i) depends on specialists to accurately label data; (ii) it is often a time-consuming and labour-intensive work,and(iii) in educational settings, it is not always easy to collect a substantial volume of data required by the machine learning algorithms. This paper presents an active learning-based approach that can reduce the amount of annotated data required to build machine learning models for the classification of educational data. The results obtained show that with only 20% of the annotated data, the proposed approach achieved similar results to those presented in the previous works that used the complete databases to train the machine learning model.
更多
查看译文
关键词
Active Learning,Community of Inquiry,Online Discussions,Text Classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要