Utilizing Image-Based Features In Biomedical Document Classification

2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)(2015)

引用 9|浏览68
暂无评分
摘要
Images form a rich information source, which remains underutilized in biomedical document classification. We present here work that uses both image-and text-based features in order to identify articles of interest, in this case, pertaining to cis-regulatory modules in the context of gene-networks. Extending on our new idea, which we have recently introduced, of using OCR-based features to identify DNA contents in images, we combine image and text based classifiers to categorize documents as relevant or irrelevant to cis-regulatory modules. Using a set of hundreds of articles, marked by experts as relevant or irrelevant to cis-regulatory modules, we train/test image and text based classifiers, as well as classifiers integrating both. Our results indicate that the latter show the best performance with Recall, F-measure and Utility measures all above 0.9, demonstrating the significance of incorporating image data, and specifically OCR-based features, into the document categorization process. Moreover, the use of character distribution properties to represent images is directly relevant to other biomedical images containing text (e.g. RNA, proteins). Diagrams and other images containing text are also prevalent outside the biomedical domain, hence the work stands to be applicable and beneficial in other application areas.
更多
查看译文
关键词
image-based features,OCR,document classification,document-representation,bioinformatics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要