Measuring Beginner Friendliness of Chinese Web Pages explaining Academic Concepts using Deep Learning and Text / HTML Features

Bingcai HAN,Hayato SHIOKAWA, Shintaro OKADA, Chiharu HIROHANA, Takehito, UTSURO,Yasuhide KAWADA,Noriko KANDO

semanticscholar(2019)

引用 0|浏览0
暂无评分
摘要
Search engine is an important tool of modern academic study, but the results are lack of measurement of beginner friendliness. For improving the efficiency of using search engine for academic study, it is necessary to find a method of measuring the beginner friendliness of Web pages explaining academic concepts and to build an automatic measurement system. In this thesis, we first formalize the measurement of beginner friendliness by several individual factors, including definition, formula and so on. We collect about 2,000 Web pages for manual measurement based on the individual factors and build a reference dataset. Then, we analyze the HTML data of the collected dataset, and extract specific features for measuring beginner friendliness. And we use a modified VGG16 model (a convolutional neural network model for image classification) to measure the layout of Web pages we have collected. The results are taken as features for further measurement. All the features are evaluated using SVM and the performance is shown in a recall-precision curve. Finally, we test about 300 Web pages and evaluate the performance of different features of HTML data and CNN measurement results of Chinese Web pages. The result of this thesis would be an important reference for further work of a practical assistance system on Web learning.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要