Course Concept Extraction in MOOCs via Embedding-Based Graph Propagation

IJCNLP, pp. 875-884, 2017.

Cited by: 21|Bibtex|Views169|Links
EI
Keywords:
candidate conceptTopical PageRankcourse videocourse concept extractionMean Average PrecisionMore(13+)
Weibo:
We study the problem concept extraction in Massive Open Online Courses

Abstract:

Massive Open Online Courses (MOOCs), offering a new way to study online, are revolutionizing education. One challenging issue in MOOCs is how to design effective and fine-grained course concepts such that students with different backgrounds can grasp the essence of the course. In this paper, we conduct a systematic investigation of the pr...More

Code:

Data:

0
Introduction
  • In contrast with traditional courses that have limited numbers of students, each online course in a MOOC platform may draw more than 100,000 registrants (Seaton et al, 2014).
  • In MOOCs, the authors use course concepts to refer to the knowledge concepts taught in the course videos, and related topics that help students better understand course videos.
  • Figure 1 shows an example to illustrate the problem addressed in this work.
  • It is a clip of video captions from the data structure course in XuetangX 1, one of the largest MOOCs in China.
  • For a non-science student, the system may display the definition of “unstable sorting” and “uniform distribution”; while for a science student, the system could recommend advanced concepts or potential applications of “quick sort”
Highlights
  • In contrast with traditional courses that have limited numbers of students, each online course in a Massive Open Online Courses platform may draw more than 100,000 registrants (Seaton et al, 2014)
  • We propose an iterative graph-based algorithm, namely course concept propagation (CCP), which first assigns each vertex of concept graph an initial confidence score, and iteratively updates the score for each vertex through propagation
  • After we explore the influences of parameters, we employ four baseline methods to compare with our proposed method, i.e., course concept propagation (CCP)
  • We find that the proposed methods outperform all baselines on all datasets, which indicates the robustness and effectiveness of concept propagation 7
  • We study the problem concept extraction in Massive Open Online Courses
  • Experimental results on evaluation datasets validate the effectiveness of the proposed method
Methods
  • TF-IDF Rp MAP PMI Rp MAP TextRank Rp MAP TPR CCP CSEN EcoEN CSZH
Results
  • Evaluation Metrics

    For the experiments, the authors select two evaluation metrics. The first metric is R-precision (Rp) (Zesch and Gurevych, 2009), which is an IR metric that focuses on ranking.
  • The authors select the Mean Average Precision (MAP), which has been the preferred metric in information retrieval for evaluating ranked lists.
  • The authors investigate the influence of α in Figure 2(a)
  • The authors show the R-precision of CCP when λ = 0.3, N = 100, and α ranges from 0 to 1.
  • From this figure the authors find that, when α is set from 0.2 to 0.7, the performance is consistently good, and remains stable with the variations of α.
  • When α is larger than 0.85, the R-precision drops in all
Conclusion
  • The authors study the problem concept extraction in MOOCs. The authors precisely define the problem and propose a graph-based propagation method to extract course concepts by incorporating external knowledge from online encyclopedias.
  • Experimental results on evaluation datasets validate the effectiveness of the proposed method.
  • Incorporating external knowledge of various kind to help extract course concepts is an intriguing direction for future research.
  • A straightforward task is to incorporate structured information such as “is-a” relation into the proposed model
Summary
  • Introduction:

    In contrast with traditional courses that have limited numbers of students, each online course in a MOOC platform may draw more than 100,000 registrants (Seaton et al, 2014).
  • In MOOCs, the authors use course concepts to refer to the knowledge concepts taught in the course videos, and related topics that help students better understand course videos.
  • Figure 1 shows an example to illustrate the problem addressed in this work.
  • It is a clip of video captions from the data structure course in XuetangX 1, one of the largest MOOCs in China.
  • For a non-science student, the system may display the definition of “unstable sorting” and “uniform distribution”; while for a science student, the system could recommend advanced concepts or potential applications of “quick sort”
  • Methods:

    TF-IDF Rp MAP PMI Rp MAP TextRank Rp MAP TPR CCP CSEN EcoEN CSZH
  • Results:

    Evaluation Metrics

    For the experiments, the authors select two evaluation metrics. The first metric is R-precision (Rp) (Zesch and Gurevych, 2009), which is an IR metric that focuses on ranking.
  • The authors select the Mean Average Precision (MAP), which has been the preferred metric in information retrieval for evaluating ranked lists.
  • The authors investigate the influence of α in Figure 2(a)
  • The authors show the R-precision of CCP when λ = 0.3, N = 100, and α ranges from 0 to 1.
  • From this figure the authors find that, when α is set from 0.2 to 0.7, the performance is consistently good, and remains stable with the variations of α.
  • When α is larger than 0.85, the R-precision drops in all
  • Conclusion:

    The authors study the problem concept extraction in MOOCs. The authors precisely define the problem and propose a graph-based propagation method to extract course concepts by incorporating external knowledge from online encyclopedias.
  • Experimental results on evaluation datasets validate the effectiveness of the proposed method.
  • Incorporating external knowledge of various kind to help extract course concepts is an intriguing direction for future research.
  • A straightforward task is to incorporate structured information such as “is-a” relation into the proposed model
Tables
  • Table1: Dataset Statistics
  • Table2: Performance of Different Methods on Different Datasets
Download tables as Excel
Related work
  • Our work is relevant to automatic keyphrase extraction, which concerns the automatic extraction of important and topical phrases from the body of a document” (Turney, 2000). Generally, keyphrase extraction techniques can be classified into two groups: supervised approaches and unsupervised approaches. In supervised machinelearning approaches, the training phase usually includes a classification task: each phrase in the document is either a keyphrase or not (You et al, 2013). Different learning algorithms have been employed to train the classifier, including naıve bayes (Frank et al, 1999; Witten et al, 1999), decision trees (Turney, 2000), maximum entropy (Yih et al, 2006; Kim and Kan, 2009) and support vector machines (Lopez and Romary, 2010; Kim and Kan, 2009). Unsupervised approaches usually involve assigning a saliency score to each candidate phrase, by considering various features (Wan and Xiao, 2008). Generally speaking, the information such as tf-idf, co-occurrence, or neighbor documents are frequently used in unsupervised keyphrase extraction. For example, TextRank (Mihalcea and Tarau, 2004) is a wellknown method that ranks keywords based on the co-occurrence graph. Huang et al (2006) utilize co-occurrence information to construct a semantic network for each document and derive the importance of phrases by analyzing the network. The ExpandRank (Wan and Xiao, 2008) model uses a set of neighborhood documents to enhance singledocument keyphrase extraction. Recently, Liu et al, (2015) proposed a new framework that extracts quality phrases from text corpora integrated with phrasal segmentation. This model also rely on local statistical information and requires a relatively large corpus.
Reference
  • Gabor Berend and Richard Farkas. 2010. SZTERGAK: Feature engineering for keyphrase extraction. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval@ACL 2010, Uppsala University, Uppsala, Sweden, July 15-16, 2010, pages 186–189.
    Google ScholarLocate open access versionFindings
  • David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. International Journal of Machine Learning Research, 3:993– 1022.
    Google ScholarLocate open access versionFindings
  • Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. International Journal of Computer Networks, 30(17):107–117.
    Google ScholarLocate open access versionFindings
  • Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. International Journal of Computational Linguistics, 16(1):22–29.
    Google ScholarLocate open access versionFindings
  • Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. International Journal of Computational Linguistics, 19(1):61–74.
    Google ScholarLocate open access versionFindings
  • Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific keyphrase extraction. In Proceedings of IJCAI, pages 668–673.
    Google ScholarLocate open access versionFindings
  • Luit Gazendam, Christian Wartena, and Rogier Brussee. 2010. Thesaurus based term ranking for keyword extraction. In Database and Expert Systems Applications, Dexa, International Workshops, Bilbao, Spain, August 30 - September, pages 49–53.
    Google ScholarLocate open access versionFindings
  • Kazi Saidul Hasan and Vincent Ng. 2014. Automatic keyphrase extraction: A survey of the state of the art. In Proceedings of ACL, pages 1262–1273.
    Google ScholarLocate open access versionFindings
  • Toru Hisamitsu, Yoshiki Niwa, and Jun’ichi Tsujii. 2000. A method of measuring term representativeness - baseline method using co-occurrence distribution. In Proceedings of COLING, pages 320–326.
    Google ScholarLocate open access versionFindings
  • Chong Huang, YongHong Tian, Zhi Zhou, Charles X. Ling, and Tiejun Huang. 2006. Keyphrase extraction using semantic networks structure analysis. In Proceedings of ICDM, pages 275–284.
    Google ScholarLocate open access versionFindings
  • Anette Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of EMNLP, pages 216–223.
    Google ScholarLocate open access versionFindings
  • John S. Justeson and Slava M. Katz. 1995. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1):9–27.
    Google ScholarLocate open access versionFindings
  • Su Nam Kim and Min-Yen Kan. 2009. Re-examining automatic keyphrase extraction approaches in scientific articles. In Proceedings of the ACL-IJCNLP Workshop on Multiword Expressions, pages 9–16.
    Google ScholarLocate open access versionFindings
  • Ioannis Korkontzelos, Ioannis P. Klapaftis, and Suresh Manandhar. 2008. Reviewing and evaluating automatic term recognition techniques. In Proceedings of GoTAL, pages 248–259.
    Google ScholarLocate open access versionFindings
  • Sujian Li, Jiwei Li, Tao Song, Wenjie Li, and Baobao Chang. 2013. A novel topic model for automatic term extraction. In Proceedings of SIGIR, pages 885– 888.
    Google ScholarLocate open access versionFindings
  • Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han. 2015. Mining quality phrases from massive text corpora. In Proceedings of ACM SIGMOD, pages 1729–1744.
    Google ScholarLocate open access versionFindings
  • Zhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong Sun. 2010. Automatic keyphrase extraction via topic decomposition. In Proceedings of EMNLP, pages 366–376.
    Google ScholarLocate open access versionFindings
  • Patrice Lopez and Laurent Romary. 2010. Humb: Automatic key term extraction from scientific articles in grobid. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 248–251. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Olena Medelyan and Ian H. Witten. 2006. Thesaurus based automatic keyphrase indexing. In Proceedings of JCDL, pages 296–297.
    Google ScholarLocate open access versionFindings
  • Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of EMNLP, pages 404–411.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. International Journal of CoRR, abs/1301.3781.
    Findings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS, pages 3111– 3119.
    Google ScholarLocate open access versionFindings
  • Marco Rospocher, Sara Tonelli, Luciano Serafini, and Emanuele Pianta. 2012. Corpus-based terminological evaluation of ontologies. Applied Ontology, 7(4):429–448.
    Google ScholarLocate open access versionFindings
  • Gerard Salton and Chris Buckley. 1988. Termweighting approaches in automatic text retrieval. International Journal of Information Processing and Management, 24(5):513–523.
    Google ScholarLocate open access versionFindings
  • Daniel T. Seaton, Yoav Bergner, Isaac L. Chuang, Piotr Mitros, and David E. Pritchard. 2014. Who does what in a massive open online course? Commun. ACM, 57(4):58–65.
    Google ScholarLocate open access versionFindings
  • Peter D. Turney. 2000. Learning algorithms for keyphrase extraction. International Journal of Information Retrieval, 2(4):303–336.
    Google ScholarLocate open access versionFindings
  • Jorge Vivaldi and Horacio Rodr?uez. 2010. Finding domain terms using wikipedia. In Proceeding of LREC 2010.
    Google ScholarLocate open access versionFindings
  • Xiaojun Wan and Jianguo Xiao. 2008. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of AAAI, pages 855– 860.
    Google ScholarLocate open access versionFindings
  • Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Kea: practical automatic keyphrase extraction. In ACM Conference on Digital Libraries, pages 254–255.
    Google ScholarLocate open access versionFindings
  • Wen-tau Yih, Joshua Goodman, and Vitor R. Carvalho. 2006. Finding advertising keywords on web pages. In Proceedings of WWW, pages 213–222.
    Google ScholarLocate open access versionFindings
  • Wei You, Dominique Fontaine, and Jean-Paul A. Barthes. 2013. An automatic keyphrase extraction system for scientific documents. Knowl. Inf. Syst., 34(3):691–724.
    Google ScholarLocate open access versionFindings
  • Torsten Zesch and Iryna Gurevych. 2009. Approximate matching for evaluating keyphrase extraction. In Proceedings of RANLP, pages 484–489.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments