Understanding Dropouts in MOOCs

AAAI, pp. 517-524, 2019.

Cited by: 26|Bibtex|Views136|Links
EI
Keywords:
dropout behaviorgood performanceHidden Markov ModelsContext-aware Feature Interaction NetworkSelf-paced ModeMore(19+)
Weibo:
We propose a contextaware feature interaction network to predict users’ dropout probability

Abstract:

Massive open online courses (MOOCs) have developed rapidly in recent years, and have attracted millions of online users. However, a central challenge is the extremely high dropout rate — recent reports show that the completion rate in MOOCs is below 5% (Onah, Sinclair, and Boyatt 2014; Kizilcec, Piech, and Schneider 2013; Seaton et al. 20...More

Code:

Data:

0
Introduction
  • Massive open online courses (MOOCs) have become increasingly popular. Many MOOC platforms have been launched.
  • Age is an important factor — young people are more inclined to drop out; Gender is another important factor — roughly, female users are more likely to drop science courses and male users are more likely to give up non-science courses; educational background is important
  • This raises several interesting questions: 1) what are the major dropout reasons?
  • This raises several interesting questions: 1) what are the major dropout reasons? 2) what are the deep motivations that drive the users to study or induce them to drop out? 3) is that possible to predict users’ dropout behavior in advance, so that the MOOCs platform could deliver some kind of useful interventions (Halawa, Greene, and Mitchell 2014; Qi et al 2018)?
Highlights
  • Massive open online courses (MOOCs) have become increasingly popular
  • A survey from Coursera shows that Massive open online courses are really beneficial to the
  • On the other hand, Massive open online courses are criticized for the low completion ratio (He et al 2015)
  • Age is an important factor — young people are more inclined to drop out; Gender is another important factor — roughly, female users are more likely to drop science courses and male users are more likely to give up non-science courses; educational background is important. This raises several interesting questions: 1) what are the major dropout reasons? 2) what are the deep motivations that drive the users to study or induce them to drop out? 3) is that possible to predict users’ dropout behavior in advance, so that the Massive open online courses platform could deliver some kind of useful interventions (Halawa, Greene, and Mitchell 2014; Qi et al 2018)?
  • Based on the analyses results, we propose a Contextaware Feature Interaction Network (CFIN) to model and to predict users’ dropout behavior
  • We propose a contextaware feature interaction network (CFIN) to predict users’ dropout probability
Methods
  • LRC SVM RF DNN GBDT CFIN CFIN-en.
  • KDDCUP AUC (%) F1 (%).
  • XuetangX AUC (%) F1 (%) KDDCUP XuetangX Features.
  • AUC (%) F1 (%) AUC (%) F1 (%) All - Video - Forum
Conclusion
  • The authors conduct a systematical study for the dropout problem in MOOCs. The authors first conduct statistical analyses to identify factors that cause users’ dropouts.
  • The authors found several interesting phenomena such as dropout correlation between courses and dropout influence between friends.
  • Based on these analyses, the authors propose a contextaware feature interaction network (CFIN) to predict users’ dropout probability.
  • The proposed method has been deployed onto XiaoMu, an intelligent learning assistant in XuetangX to help improve students retention.
  • The authors are working on applying the method to several other systems such as ArnetMiner (Tang et al 2008)
Summary
  • Introduction:

    Massive open online courses (MOOCs) have become increasingly popular. Many MOOC platforms have been launched.
  • Age is an important factor — young people are more inclined to drop out; Gender is another important factor — roughly, female users are more likely to drop science courses and male users are more likely to give up non-science courses; educational background is important
  • This raises several interesting questions: 1) what are the major dropout reasons?
  • This raises several interesting questions: 1) what are the major dropout reasons? 2) what are the deep motivations that drive the users to study or induce them to drop out? 3) is that possible to predict users’ dropout behavior in advance, so that the MOOCs platform could deliver some kind of useful interventions (Halawa, Greene, and Mitchell 2014; Qi et al 2018)?
  • Methods:

    LRC SVM RF DNN GBDT CFIN CFIN-en.
  • KDDCUP AUC (%) F1 (%).
  • XuetangX AUC (%) F1 (%) KDDCUP XuetangX Features.
  • AUC (%) F1 (%) AUC (%) F1 (%) All - Video - Forum
  • Conclusion:

    The authors conduct a systematical study for the dropout problem in MOOCs. The authors first conduct statistical analyses to identify factors that cause users’ dropouts.
  • The authors found several interesting phenomena such as dropout correlation between courses and dropout influence between friends.
  • Based on these analyses, the authors propose a contextaware feature interaction network (CFIN) to predict users’ dropout probability.
  • The proposed method has been deployed onto XiaoMu, an intelligent learning assistant in XuetangX to help improve students retention.
  • The authors are working on applying the method to several other systems such as ArnetMiner (Tang et al 2008)
Tables
  • Table1: Statistics of the KDDCUP dataset
  • Table2: Statistics of the XuetangX dataset
  • Table3: Results of clustering analysis. C1-C5 — Cluster 1 to 5; CAR — average correct answer ratio
  • Table4: Overall Results on KDDCUP dataset and IPM courses of XuetangX dataset
  • Table5: Contribution analysis for different engagements on KDDCUP dataset and IPM courses of XuetangX dataset
  • Table6: Average attention weights of different clusters. C1C5 — Cluster 1 to 5; CAR — average correct answer ratio
  • Table7: Results of intervention by A/B test. WVT — average time (s) of video watching; ASN — average number of completed assignments; CAR — average ratio of correct answers
Download tables as Excel
Related work
  • Prior studies apply generalized linear models (including logistic regression and linear SVMs (Kloft et al 2014; He et al 2015)) to predict dropout. Balakrishnan et al (2013) present a hybrid model which combines Hidden Markov Models (HMM) and logistic regression to predict student retention on a single course. Another attempt by Xing et al (2016) uses an ensemble stacking generalization approach to build robust and accurate prediction models. Deep learning methods are also used for predicting dropout. For example, Fei et al (2015) tackle this problem from a sequence labeling perspective and apply an RNN based model to predict students’ dropout probability. Wang et al (2017) propose a hybrid deep neural network dropout prediction model by combining the CNN and RNN. Ramesh et al (2014) develop a probabilistic soft logic (PSL) framework to predict user retention by modeling student engagement types using latent variables. Cristeaet et al (2018) propose a light-weight method which can predict dropout before user start learning only based on her/his registration date. Besides prediction itself, Nagrecha et al (2017) focus on the interpretability of existing dropout prediction methods. Whitehill et al (2015) design an online intervention strategy to boost users’ callback in MOOCs. Dalipi et al (2018) review the techniques of dropout prediction and propose several insightful suggestions for this task. What’s more, XuetangX has organized the KDDCUP 20152 for dropout prediction. In that competition, most teams adopt assembling strategies to improve the prediction performance, and “Intercontinental Ensemble” team get the best performance by assembling over sixty single models.
Funding
  • The work is supported by the National Natural Science Foundation of China (61631013), the Center for Massive Online Education of Tsinghua University, and XuetangX
Reference
  • Balakrishnan, G., and Coetzee, D. 2013. Predicting student retention in massive open online courses using hidden markov models. Electrical Engineering and Computer Sciences University of California at Berkeley.
    Google ScholarLocate open access versionFindings
  • Chen, T., and Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
    Google ScholarLocate open access versionFindings
  • Cristea, A. I.; Alamri, A.; Kayama, M.; Stewart, C.; Alshehri, M.; and Shi, L. 2018. Earliest predictor of dropout in moocs: a longitudinal study of futurelearn courses.
    Google ScholarFindings
  • Dalipi, F.; Imran, A. S.; and Kastrati, Z. 2018. Mooc dropout prediction using machine learning techniques: Review and research challenges. In Global Engineering Education Conference (EDUCON), 2018 IEEE, 1007–101IEEE.
    Google ScholarLocate open access versionFindings
  • Fei, M., and Yeung, D.-Y. 201Temporal Models for Predicting Student Dropout in Massive Open Online Courses. 2015 IEEE International Conference on Data Mining Workshop (ICDMW) 256–263.
    Google ScholarLocate open access versionFindings
  • Halawa, S.; Greene, D.; and Mitchell, J. 2014. Dropout prediction in moocs using learner activity features. Experiences and best practices in and around MOOCs 3–12.
    Google ScholarFindings
  • He, J.; Bailey, J.; Rubinstein, B. I. P.; and Zhang, R. 2015. Identifying at-risk students in massive open online courses. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 1749–1755.
    Google ScholarLocate open access versionFindings
  • Kellogg, S. 2013. Online learning: How to make a mooc. Nature 369–371.
    Google ScholarFindings
  • Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Kizilcec, R. F.; Piech, C.; and Schneider, E. 2013. Deconstructing disengagement: Analyzing learner subpopulations in massive open online courses. In Proceedings of the Third International Conference on Learning Analytics and Knowledge, 170–179.
    Google ScholarLocate open access versionFindings
  • Kloft, M.; Stiehler, F.; Zheng, Z.; and Pinkwart, N. 2014. Predicting MOOC Dropout over Weeks Using Machine Learning Methods. 60–65.
    Google ScholarFindings
  • Nagrecha, S.; Dillon, J. Z.; and Chawla, N. V. 2017. Mooc dropout prediction: Lessons learned from making pipelines interpretable. In WWW’17, 351–359.
    Google ScholarLocate open access versionFindings
  • Nair, V., and Hinton, G. E. 2010. Rectified linear units improve restricted boltzmann machines. In ICML’10, 807– 814.
    Google ScholarLocate open access versionFindings
  • Onah, D. F.; Sinclair, J.; and Boyatt, R. 20Dropout rates of massive open online courses: behavioural patterns. EDULEARN’14 5825–5834.
    Google ScholarFindings
  • Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 701–710.
    Google ScholarLocate open access versionFindings
  • Qi, Y.; Wu, Q.; Wang, H.; Tang, J.; and Sun, M. 2018. Bandit learning with implicit feedback. In NIPS’18.
    Google ScholarFindings
  • Qiu, J.; Tang, J.; Liu, T. X.; Gong, J.; Zhang, C.; Zhang, Q.; and Xue, Y. 2016. Modeling and predicting learning behavior in moocs. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, 93–102.
    Google ScholarLocate open access versionFindings
  • Ramesh, A.; Goldwasser, D.; Huang, B.; Daume, III, H.; and Getoor, L. 2014. Learning latent engagement patterns of students in online courses. In AAAI’14, 1272–1278.
    Google ScholarLocate open access versionFindings
  • Reich, J. 2015. Rebooting mooc research. Science 34–35.
    Google ScholarLocate open access versionFindings
  • Rousseeuw, P. J. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics 20:53–65.
    Google ScholarLocate open access versionFindings
  • Seaton, D. T.; Bergner, Y.; Chuang, I.; Mitros, P.; and Pritchard, D. E. 2014. Who does what in a massive open online course? Communications of the Acm 58–65.
    Google ScholarFindings
  • Shah, D. 2018. A product at every price: A review of mooc stats and trends in 2017. Class Central.
    Google ScholarFindings
  • Tang, J.; Zhang, J.; Yao, L.; Li, J.; Zhang, L.; and Su, Z. 2008. Arnetminer: Extraction and mining of academic social networks. In KDD’08, 990–998.
    Google ScholarLocate open access versionFindings
  • Wang, W.; Yu, H.; and Miao, C. 2017. Deep model for dropout prediction in moocs. In Proceedings of the 2nd International Conference on Crowd Science and Engineering, 26–32. ACM.
    Google ScholarLocate open access versionFindings
  • Whitehill, J.; Williams, J.; Lopez, G.; Coleman, C.; and Reich, J. 2015. Beyond prediction: First steps toward automatic intervention in mooc student stopout.
    Google ScholarFindings
  • Wolpert, D. H. 1992. Stacked generalization. Neural networks 5(2):241–259.
    Google ScholarLocate open access versionFindings
  • Xing, W.; Chen, X.; Stein, J.; and Marcinkowski, M. 2016. Temporal predication of dropouts in moocs: Reaching the low hanging fruit through stacking generalization. Computers in Human Behavior 119–129.
    Google ScholarFindings
  • Zheng, S.; Rosson, M. B.; Shih, P. C.; and Carroll, J. M. 2015. Understanding student motivation, behaviors and perceptions in moocs. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work, 1882– 1895.
    Google ScholarLocate open access versionFindings
  • Zhenghao, C.; Alcorn, B.; Christensen, G.; Eriksson, N.; Koller, D.; and Emanuel, E. 2015. Whos benefiting from moocs, and why. Harvard Business Review 25.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments