Combining Learned Script Points and Combinatorial Optimization for Text Line Extraction

HIP@ICDAR(2015)

引用 6|浏览15
暂无评分
摘要
Complex layouts, curved text lines, heterogeneous background, noise, and clutter still render text line extraction in the context of historical documents a challenging task where traditional methods do not excel. We propose a novel text line extraction method with two contributions: first, text-specific interest points extracted by supervised machine learning; and second, reformulating the problem of bottom-up text line aggregation as noise-robust combinatorial optimization. In a final step, unsupervised clustering eliminates invalid text lines. Building the method on top of interest points and posing aggregation as global optimization problem, we can detect text lines with arbitrary orientation and curvature, and are robust to noise and clutter. Experimental evaluations on the IAM Saint Gall dataset show promising results.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要