Layout-aware information extraction from semi-structured medical images.

Computers in Biology and Medicine(2019)

引用 5|浏览61
暂无评分
摘要
Textual information embedded in the medical image contains rich structured information about the medical condition of a patient. This paper aims at extracting structured textual information from semi-structured medical images. Given the recognized text spans of an image preprocessed by optical character recognition (OCR), due to the spatial discontinuity of texts spans as well as potential errors brought by OCR, the structured information extraction becomes more challenging. In this paper, we propose a domain-specific language, called ODL, which allows users to describe the value and layout of text data contained in the images. Based on the value and spatial constraints described in ODL, the ODL parser associates values found in the image with the data structure in the ODL description, while conforming to the aforementioned constraints. We conduct experiments on a dataset consisting of real medical images, our ODL parser consistently outperforms existing approaches in terms of extraction accuracy, which shows the better tolerance of incorrectly recognized texts, and positional variances between images. This accuracy can be further improved by learning from a few manual corrections.
更多
查看译文
关键词
Information extraction,Medical images,Electronic medical records,Domain-specific language,Spatial layout,Optical character recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要