Handwritten Text Line Extraction Based On Minimum Spanning Tree Clustering
2007 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION, VOLS 1-4, PROCEEDINGS(2007)
摘要
Text line extraction from unconstrained handwritten documents is a challenge because the text lines are often skewed and curved and the space between lines is not obvious. To solve this problem, we propose an approach based on minimum spanning tree (MST) clustering with new distance measures. First, the connected components of the document image are grouped into a tree by MST clustering with a new distance measure. The edges of the tree are then dynamically cut to form text lines by using a new objective function for finding the number of clusters. This approach is totally parameter-free and can apply to various documents with multi-skewed and curved lines. Experiments on handwritten Chinese documents demonstrate the effectiveness of the approach.
更多查看译文
关键词
OCR,handwritten text line extraction,connected component labeling,MST clustering,multi-skewed document
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络