AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Robust Text Detection In Natural Images With Edge-Enhanced Maximally Stable Extremal Regions
2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), pp.2609-2612, (2011)
- Mobile visual search has gained popular interest with the increasing availability of high-performance, low-cost camera-phones.
- Visual search systems have been developed for applications such as product recognition [1, 2] and landmark recognition 
- In these systems, local image features [4, 5, 6] are extracted from images taken with a camera-phone and are matched to a large database using visual word indexing techniques [7, 8].
- Given the vast number of text-based search engines, retrieving an image using the embedded text offers an efficient supplement to the visual search systems
- Mobile visual search has gained popular interest with the increasing availability of high-performance, low-cost camera-phones
- We propose a novel connected component (CC)-based text detection algorithm, which employs Maximally Stable Extremal Regions (MSER)  as our basic letter candidates
- Motivated by Epshtein’s work on the Stroke Width Transform (SWT) , we develop an image operator to transform the binary image into its stroke width image
- We apply our algorithm to a document database, which we have created to test a document retrieval system based on text as well as low bit rate features in 
- To overcome the sensitivity of MSER with respect to image blur and to detect even very small letters, we developed an edge-enhanced MSER which exploits the complimentary properties of MSER and Canny edges
- Our system can be efficiently combined with visual search systems by sharing MSER as interest regions
- To evaluate the text detection algorithm, the authors apply it to two different test sets.
- As a primary test the authors use the well-known ICDAR text detection competition data set [26, 15], which was used as a benchmark for [16, 27, 28].
- Two competitions (ICDAR 2003 and 2005) have been held to evaluate the performance of various text detection algorithms [26, 15].
- Since it is unlikely to produce estimated rectangles which exactly align with the manually labeled ground truth, the f metric can vary from 0.8 − 1.0 even when all text is correctly localized
- A novel text detection algorithm is proposed, which employs Maximally Stable Extremal regions as basic letter candidates.
- The authors present a novel image operator to accurately determine the stroke width of binary CCs. The authors' proposed method has demonstrated state-of-the-art performance for localizing text in natural images.
- The detected text are binarized letter patches, which can be directly used for text recognition purposes.
- The authors' system can be efficiently combined with visual search systems by sharing MSER as interest regions
- Table1: Evaluation of text detection algorithms
- Our algorithm achieves an f score similar to Epshtein , outperforming all results from the text detection competition
- S. S. Tsai, D. Chen, V. Chandrasekhar, G. Takacs, N. M. Cheung, R. Vedantham, R. Grzeszczuk, and B. Girod, “Mobile product recognition,” in Proc. ACM Multimedia 2010, 2010.
- D. Chen, S. S. Tsai, C. H. Hsu, K. Kim, J. P. Singh, and B. Girod, “Building book inventories using smartphones,” in Proc. ACM Multimedia, 2010.
- G. Takacs, Y. Xiong, R. Grzeszczuk, V. Chandrasekhar, W. Chen, L. Pulli, N. Gelfand, T. Bismpigiannis, and B. Girod, “Outdoors augmented reality on mobile phone using loxel-based visual feature organization,” in Proc. ACM Multimedia Information Retrieval, 2008, pp. 427–434.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, pp. 91–110, 2004.
- H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346 – 359, 2008.
- V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod, “CHoG: Compressed histogram of gradients. a low bit-rate feature descriptor,” in CVPR, 2009, pp. 2504 –2511.
- D. Nister and H. Stewenius, “Scalable recognition with a vocabulary tree,” in CVPR, 2006, pp. 2161–2168.
- D. M. Chen, S. S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod, “Inverted Index Compression for Scalable Image Matching,” in Proc. of IEEE Data Compression Conference (DCC), Snowbird, Utah, March 2010.
- J. Liang, D. Doermann, and H. P. Li, “Camera-based analysis of text and documents: a survey,” IJDAR, vol. 7, no. 2-3, pp. 84–104, 2005.
- K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in images and video: a survey,” Pattern Recognition, vol. 37, no. 5, pp. 977 – 997, 2004.
- Y. Zhong, H. Zhang, and A. K. Jain, “Automatic caption localization in compressed video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 4, pp. 385 –392, 2000.
- Q. Ye, Q. Huang, W. Gao, and D. Zhao, “Fast and robust text detection in images and video frames,” Image Vision Comput., vol. 23, pp. 565–576, 2005.
- X. Chen and A. L. Yuille, “Detecting and reading text in natural scenes,” in CVPR, 2004, vol. 2, pp. II–366 – II–373 Vol.2.
- X. Chen and A. L. Yuille, “A time-efficient cascade for real-time object detection: With applications for the visually impaired,” in CVPR - Workshops, 2005, p. 28.
- S. M. Lucas, “ICDAR 2005 text locating competition results,” in ICDAR, 2005, pp. 80 – 84 Vol. 1.
- B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in CVPR, 2010, pp. 2963 –2970.
- P. Shivakumara, T. Q. Phan, and C. L. Tan, “A laplacian approach to multi-oriented text detection in video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 412 –419, feb. 2011.
- J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” in British Machine Vision Conference, 2002, vol. 1, pp. 384–393.
- K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” Int. J. Comput. Vision, vol. 65, pp. 43–72, 2005.
- D. Nister and H. Stewenius, “Linear time maximally stable extremal regions,” in ECCV, 2008, pp. 183–196.
- D. G. Bailey, “An efficient euclidean distance transform,” in Combinatorial Image Analysis, IWCIA, 2004, pp. 394–408.
- J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 8, pp. 679–698, 1986.
- A. Srivastav and J. Kumar, “Text detection in scene images using stroke width and nearest-neighbor constraints,” in TENCON 2008 - 2008 IEEE Region 10 Conference, 2008, pp. 1–5.
- K. Subramanian, P. Natarajan, M. Decerbo, and D. Castanon, “Character-stroke detection for text-localization and extraction,” in ICDAR, 2007, vol. 1, pp. 33–37.
- N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst. Man Cybern., vol. 9, no. 1, pp. 62 –66, 1979.
- S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R Young, “ICDAR 2003 robust reading competitions,” in ICDAR, 2003, vol. 2, p. 682.
- R. Minetto, N. Thome, M. Cord, J. Fabrizio, and B. Marcotegui, “Snoopertext: A multiresolution system for text detection in complex visual scenes.,” in ICIP, 2010, pp. 3861–3864.
- J. Fabrizio, M. Cord, and B. Marcotegui, “Text extraction from street level images,” in CMRT, 2009, pp. 199–204.
- S. S. Tsai, H. Chen, D. M. Chen, G. Schroth, R. Grzeszczuk, and B. Girod, “Mobile visual search on papers using text and low bit-rate features,” in ICIP, 2011.