Meaningful string extraction based on clustering for improving webpage classification

CHINA COMMUNICATIONS(2012)

引用 0|浏览19
暂无评分
摘要
Since webpage classification is different from traditional text classification with its irregular words and phrases, massive and unlabeled features, which makes it harder for us to obtain effective feature. To cope with this problem, we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model (VSM) in order to improve webpage classification. The results show that document clustering work better than term clustering in coping with document content. However, a better overall performance is obtained by spectral clustering with document clustering. Moreover, owing to image existing in a same webpage with document content, the proposed method is also applied to extract image meaningful terms, and experiment results also show its effectiveness in improving webpage classification.
更多
查看译文
关键词
webpage classification,meaningful string extraction,document clustering,term clustering,k-means,spectral clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要