AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
To overcome the sensitivity of Maximally Stable Extremal Regions with respect to image blur and to detect even very small letters, we developed an edge-enhanced MSER which exploits the complimentary properties of MSER and Canny edges

Robust Text Detection In Natural Images With Edge-Enhanced Maximally Stable Extremal Regions

2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), pp.2609-2612, (2011)

Cited by: 442|Views190
EI

Abstract

Detecting text in natural images is an important prerequisite. In this paper, we propose a novel text detection algorithm, which employs edge-enhanced Maximally Stable Extremal Regions as basic letter candidates. These candidates are then filtered using geometric and stroke width information to exclude non-text objects. Letters are paired...More

Code:

Data:

0
Introduction
  • Mobile visual search has gained popular interest with the increasing availability of high-performance, low-cost camera-phones.
  • Visual search systems have been developed for applications such as product recognition [1, 2] and landmark recognition [3]
  • In these systems, local image features [4, 5, 6] are extracted from images taken with a camera-phone and are matched to a large database using visual word indexing techniques [7, 8].
  • Given the vast number of text-based search engines, retrieving an image using the embedded text offers an efficient supplement to the visual search systems
Highlights
  • Mobile visual search has gained popular interest with the increasing availability of high-performance, low-cost camera-phones
  • We propose a novel connected component (CC)-based text detection algorithm, which employs Maximally Stable Extremal Regions (MSER) [18] as our basic letter candidates
  • Motivated by Epshtein’s work on the Stroke Width Transform (SWT) [16], we develop an image operator to transform the binary image into its stroke width image
  • We apply our algorithm to a document database, which we have created to test a document retrieval system based on text as well as low bit rate features in [29]
  • To overcome the sensitivity of MSER with respect to image blur and to detect even very small letters, we developed an edge-enhanced MSER which exploits the complimentary properties of MSER and Canny edges
  • Our system can be efficiently combined with visual search systems by sharing MSER as interest regions
Results
  • To evaluate the text detection algorithm, the authors apply it to two different test sets.
  • As a primary test the authors use the well-known ICDAR text detection competition data set [26, 15], which was used as a benchmark for [16, 27, 28].
  • Two competitions (ICDAR 2003 and 2005) have been held to evaluate the performance of various text detection algorithms [26, 15].
  • Since it is unlikely to produce estimated rectangles which exactly align with the manually labeled ground truth, the f metric can vary from 0.8 − 1.0 even when all text is correctly localized
Conclusion
  • A novel text detection algorithm is proposed, which employs Maximally Stable Extremal regions as basic letter candidates.
  • The authors present a novel image operator to accurately determine the stroke width of binary CCs. The authors' proposed method has demonstrated state-of-the-art performance for localizing text in natural images.
  • The detected text are binarized letter patches, which can be directly used for text recognition purposes.
  • The authors' system can be efficiently combined with visual search systems by sharing MSER as interest regions
Tables
  • Table1: Evaluation of text detection algorithms
Download tables as Excel
Funding
  • Our algorithm achieves an f score similar to Epshtein [16], outperforming all results from the text detection competition
Reference
  • S. S. Tsai, D. Chen, V. Chandrasekhar, G. Takacs, N. M. Cheung, R. Vedantham, R. Grzeszczuk, and B. Girod, “Mobile product recognition,” in Proc. ACM Multimedia 2010, 2010.
    Google ScholarLocate open access versionFindings
  • D. Chen, S. S. Tsai, C. H. Hsu, K. Kim, J. P. Singh, and B. Girod, “Building book inventories using smartphones,” in Proc. ACM Multimedia, 2010.
    Google ScholarLocate open access versionFindings
  • G. Takacs, Y. Xiong, R. Grzeszczuk, V. Chandrasekhar, W. Chen, L. Pulli, N. Gelfand, T. Bismpigiannis, and B. Girod, “Outdoors augmented reality on mobile phone using loxel-based visual feature organization,” in Proc. ACM Multimedia Information Retrieval, 2008, pp. 427–434.
    Google ScholarLocate open access versionFindings
  • D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, pp. 91–110, 2004.
    Google ScholarLocate open access versionFindings
  • H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346 – 359, 2008.
    Google ScholarLocate open access versionFindings
  • V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod, “CHoG: Compressed histogram of gradients. a low bit-rate feature descriptor,” in CVPR, 2009, pp. 2504 –2511.
    Google ScholarLocate open access versionFindings
  • D. Nister and H. Stewenius, “Scalable recognition with a vocabulary tree,” in CVPR, 2006, pp. 2161–2168.
    Google ScholarFindings
  • D. M. Chen, S. S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod, “Inverted Index Compression for Scalable Image Matching,” in Proc. of IEEE Data Compression Conference (DCC), Snowbird, Utah, March 2010.
    Google ScholarLocate open access versionFindings
  • J. Liang, D. Doermann, and H. P. Li, “Camera-based analysis of text and documents: a survey,” IJDAR, vol. 7, no. 2-3, pp. 84–104, 2005.
    Google ScholarLocate open access versionFindings
  • K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in images and video: a survey,” Pattern Recognition, vol. 37, no. 5, pp. 977 – 997, 2004.
    Google ScholarLocate open access versionFindings
  • Y. Zhong, H. Zhang, and A. K. Jain, “Automatic caption localization in compressed video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 4, pp. 385 –392, 2000.
    Google ScholarLocate open access versionFindings
  • Q. Ye, Q. Huang, W. Gao, and D. Zhao, “Fast and robust text detection in images and video frames,” Image Vision Comput., vol. 23, pp. 565–576, 2005.
    Google ScholarLocate open access versionFindings
  • X. Chen and A. L. Yuille, “Detecting and reading text in natural scenes,” in CVPR, 2004, vol. 2, pp. II–366 – II–373 Vol.2.
    Google ScholarFindings
  • X. Chen and A. L. Yuille, “A time-efficient cascade for real-time object detection: With applications for the visually impaired,” in CVPR - Workshops, 2005, p. 28.
    Google ScholarFindings
  • S. M. Lucas, “ICDAR 2005 text locating competition results,” in ICDAR, 2005, pp. 80 – 84 Vol. 1.
    Google ScholarFindings
  • B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in CVPR, 2010, pp. 2963 –2970.
    Google ScholarFindings
  • P. Shivakumara, T. Q. Phan, and C. L. Tan, “A laplacian approach to multi-oriented text detection in video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 412 –419, feb. 2011.
    Google ScholarLocate open access versionFindings
  • J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” in British Machine Vision Conference, 2002, vol. 1, pp. 384–393.
    Google ScholarLocate open access versionFindings
  • K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” Int. J. Comput. Vision, vol. 65, pp. 43–72, 2005.
    Google ScholarLocate open access versionFindings
  • D. Nister and H. Stewenius, “Linear time maximally stable extremal regions,” in ECCV, 2008, pp. 183–196.
    Google ScholarLocate open access versionFindings
  • D. G. Bailey, “An efficient euclidean distance transform,” in Combinatorial Image Analysis, IWCIA, 2004, pp. 394–408.
    Google ScholarFindings
  • J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 8, pp. 679–698, 1986.
    Google ScholarLocate open access versionFindings
  • A. Srivastav and J. Kumar, “Text detection in scene images using stroke width and nearest-neighbor constraints,” in TENCON 2008 - 2008 IEEE Region 10 Conference, 2008, pp. 1–5.
    Google ScholarFindings
  • K. Subramanian, P. Natarajan, M. Decerbo, and D. Castanon, “Character-stroke detection for text-localization and extraction,” in ICDAR, 2007, vol. 1, pp. 33–37.
    Google ScholarLocate open access versionFindings
  • N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst. Man Cybern., vol. 9, no. 1, pp. 62 –66, 1979.
    Google ScholarLocate open access versionFindings
  • S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R Young, “ICDAR 2003 robust reading competitions,” in ICDAR, 2003, vol. 2, p. 682.
    Google ScholarLocate open access versionFindings
  • R. Minetto, N. Thome, M. Cord, J. Fabrizio, and B. Marcotegui, “Snoopertext: A multiresolution system for text detection in complex visual scenes.,” in ICIP, 2010, pp. 3861–3864.
    Google ScholarFindings
  • J. Fabrizio, M. Cord, and B. Marcotegui, “Text extraction from street level images,” in CMRT, 2009, pp. 199–204.
    Google ScholarLocate open access versionFindings
  • S. S. Tsai, H. Chen, D. M. Chen, G. Schroth, R. Grzeszczuk, and B. Girod, “Mobile visual search on papers using text and low bit-rate features,” in ICIP, 2011.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科