Detection And Retargeting Of Emphasized Text For Content Summarization

2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI)(2016)

引用 1|浏览37
暂无评分
摘要
In this paper, we propose a simple and robust. algorithm for detection and retargeting of emphasized words, written as italics, in a scanned document page. The detection of italics is done using an appropriate use of Principal Component. Analysis (PCA), applied on a selected subset of pixels coming from the input character image boundary. The proposed method is font and style invariant. The localization of the emphasized words helps us in information retrieval by means of retargeted words. It is seen that, a good number of publication houses use emphasized (italic) words for specifying the keywords, author's affiliations etc., in the front page of the articles. Our method extracts and retargets the emphasized words to summarize the content of the papers. Experimental result shows the robustness and degree of precision.
更多
查看译文
关键词
Emphasized words detection,principal component analysis,information retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要