Segmentation-Less Extraction of Text and Non-Text Regions From JPEG 2000 Compressed Document Images Through Partial and Intelligent Decompression.

IEEE Access(2023)

引用 1|浏览1
暂无评分
摘要
JPEG 2000 is a popular image compression technique that uses Discrete Wavelet Transform (DWT) for compression and subsequently provides many rich features for efficient storage and decompression. Though compressed images are preferred for archival and communication purposes, their processing becomes difficult due to the overhead of decompression and re-compression operations which are needed as many times the data needs to operate. Therefore in this research paper, the novel idea of direct operation over the JPEG 2000 compressed documents is proposed for extracting text and non-text regions without using any segmentation algorithm. The technique avoids full decompression of the compressed document in contrast to the conventional methods, where they fully decompress and then process. Moreover, JPEG 2000 features are explored in this research work to partially and intelligently decompress only the selected regions of interest at different resolutions and bitdepths to accomplish segmentation-less extraction of text and non-text regions. Finally Maximally Stable Extremal Regions (MSER) algorithm is used to extract the layout of segmented text and non-text regions for further analysis. Experiments have been carried out on the standard PRImA Layout Analysis Dataset leading to promising results and saving computational resources.
更多
查看译文
关键词
Image coding,Image segmentation,Transform coding,Image resolution,Layout,Discrete wavelet transforms,Image color analysis,Bitdepths,DWT,JPEG 2000,MSER,partial and intelligent decompression,resolutions,text and non-text segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要