A graph based method for Arabic document indexing

2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)(2016)

引用 7|浏览5
暂无评分
摘要
Extracting knowledge from text data and taking its full advantage has been an important way to reduce its computation and accelerate processing, especially for large amounts of data. Thus, different approaches and methodologies for modeling and representing textual data have been proposed. In this paper, a graph-based approach for automatic indexing of unstructured data from an Arabic corpus has been proposed. First, each document in the collection is represented by a graph. After the generation of document graph, term weighting is computed to estimate the relevance of a term to the document. The graph representation offers the advantage that it allows for a much more expressive document modeling than the standard bag of words approach, and consequently, it improves classification performance. Experimental results show that the graph based indexing method is a promising approach for semantic and contextual indexation, and outperforms statistical based method (TFIDF) by 12% in F-measure.
更多
查看译文
关键词
Arabic Text Mining,TFIDF,TextRank,graph representation,indexation,classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要