Learning Free Document Image Binarization Based on Fast Fuzzy C-Means Clustering

ICDAR(2019)

引用 6|浏览15
暂无评分
摘要
In this paper, a novel local threshold binarization method using fast Fuzzy C-Means clustering is proposed. Historical document images with non-uniform background, stains, faded ink are first processed by removing the background using inpainting based method. Then using Fuzzy C-Means clustering is used to cluster out the pixels into three main clusters : sure text pixels, sure background pixels and confused pixels which may or may not be labeled as text. Based on the structural symmetry of pixels (SSP), these confused pixels are then classified into text or background pixels. The SSP is defined as those pixels around strokes whose gradient magnitudes are big enough and whose directions are opposite. As the gradient map is our basis for computing the SSP, we further propose to estimate the background surface first and to extract potential SSP in the compensated image so as to deal with degradations of document images such as uneven illumination, low contrast and stain. To prove the effectiveness of our method, tests on eight public document image datasets are preformed and the experimental results show that our method outperforms other local threshold binarization approaches on both F-measure and PSNR.
更多
查看译文
关键词
Binarization,Fuzzy C-Means,Background Removal,Stroke Width Estimation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要