Don’t Hit Me! Glass Detection in Real-World Scenes

CVPR, pp. 3684-3693, 2020.

Cited by: 0|Bibtex|Views33|Links
EI
Keywords:
instance segmentationvision systemsemantic segmentationbatch normalizationatrous spatial pyramid poolingMore(31+)
Weibo:
We have proposed an important problem of detecting glass from a single RGB image and provided a large-scale glass detection dataset covering diverse scenes in our daily life

Abstract:

Glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass, and the content within the glas...More

Code:

Data:

0
Introduction
  • Glass is a non-crystalline, often transparent, amorphous solid that has widespread practical and decorative usages, e.g., window panes, glass doors, and glass walls.
  • Such glass objects can have a critical impact to the existing vision systems as demonstrated in Figure 1, and would further affect intelligent decisions in many applications such as robotic navigation and drone tracking, i.e., the robot/drone might crash into the glass.
Highlights
  • Glass is a non-crystalline, often transparent, amorphous solid that has widespread practical and decorative usages, e.g., window panes, glass doors, and glass walls
  • The first two metrics are the intersection of union (IoU) and pixel accuracy (PA), which are widely used in the semantic segmentation field
  • We adopt the F-measure and mean absolute error (MAE) metrics from the salient object detection field
  • We have proposed an important problem of detecting glass from a single RGB image and provided a large-scale glass detection dataset (GDD) covering diverse scenes in our daily life
  • Extensive evaluations on the images in and beyond the glass detection dataset test set verify the effectiveness of our network
  • As the first attempt to address the glass detection problem with a computational approach, we focus in this paper on detecting glass from a single RGB image
Methods
  • As a first attempt to detect glass from a single RGB image, the authors validate the effectiveness of the GDNet by comparing it with 18 state-of-the-art methods from other related fields.
  • The state-of-the-arts are typically confused by the non-glass regions, which share similar boundaries/appearances with the glass regions, the method can successfully eliminate such ambiguities and detect only the real glass regions (e.g., 1st, 7th and 8th rows).
  • This is mainly contributed by the proposed large-field contextual feature learning, which provides abundant contextual information for context inference and glass localization.
Results
  • Evaluation metrics

    For a comprehensive evaluation, the authors adopt five metrics for quantitatively evaluating the glass detection performance.
  • The authors adopt the F-measure and mean absolute error (MAE) metrics from the salient object detection field.
  • Table 1 reports the quantitative results of glass detection on the proposed GDD test set.
  • It can be seen that the method is capable of accurately detecting both small glass and large glass (e.g., 4-7th rows)
  • This is mainly because multiscale contextual features extracted by the LCFI module can help the network better locate and segment glass.
Conclusion
  • The authors have proposed an important problem of detecting glass from a single RGB image and provided a large-scale glass detection dataset (GDD) covering diverse scenes in the daily life.
  • A novel network is proposed to address this challenging task.
  • It leverages both high-level and low-level contexts extracted from a large field to detect glass of different sizes in various scenes.
Summary
  • Introduction:

    Glass is a non-crystalline, often transparent, amorphous solid that has widespread practical and decorative usages, e.g., window panes, glass doors, and glass walls.
  • Such glass objects can have a critical impact to the existing vision systems as demonstrated in Figure 1, and would further affect intelligent decisions in many applications such as robotic navigation and drone tracking, i.e., the robot/drone might crash into the glass.
  • Methods:

    As a first attempt to detect glass from a single RGB image, the authors validate the effectiveness of the GDNet by comparing it with 18 state-of-the-art methods from other related fields.
  • The state-of-the-arts are typically confused by the non-glass regions, which share similar boundaries/appearances with the glass regions, the method can successfully eliminate such ambiguities and detect only the real glass regions (e.g., 1st, 7th and 8th rows).
  • This is mainly contributed by the proposed large-field contextual feature learning, which provides abundant contextual information for context inference and glass localization.
  • Results:

    Evaluation metrics

    For a comprehensive evaluation, the authors adopt five metrics for quantitatively evaluating the glass detection performance.
  • The authors adopt the F-measure and mean absolute error (MAE) metrics from the salient object detection field.
  • Table 1 reports the quantitative results of glass detection on the proposed GDD test set.
  • It can be seen that the method is capable of accurately detecting both small glass and large glass (e.g., 4-7th rows)
  • This is mainly because multiscale contextual features extracted by the LCFI module can help the network better locate and segment glass.
  • Conclusion:

    The authors have proposed an important problem of detecting glass from a single RGB image and provided a large-scale glass detection dataset (GDD) covering diverse scenes in the daily life.
  • A novel network is proposed to address this challenging task.
  • It leverages both high-level and low-level contexts extracted from a large field to detect glass of different sizes in various scenes.
Tables
  • Table1: Quantitative comparison to state-of-the-arts on the GDD test set. All methods are re-trained on the GDD training set. * denotes using CRFs [<a class="ref-link" id="c13" href="#r13">13</a>] for post-processing. “Statistics” means thresholding glass location statistics from our training set as a glass mask for detection. The first, second and third best results are marked in red, green, and blue, respectively. Our method achieves the state-of-the-art under all five common evaluation metrics
  • Table2: Component analysis. “base” denotes our network with all LCFI modules removed. “one scale” and “two scales” denote that there are one and two LCFI blocks in the LCFI module. “local” denotes replacing spatially separable convolutions in LCFI with local convolutions and keeping the parameters approximately the same. Based on “local”, “sparse” adopts dilated convolutions to achieve a similar receptive field as spatially separable convolutions. “one path” denotes that there is only one spatially separable convolution path in each LCFI block. Our LCFI module contains four LCFI blocks and each of them contains two parallel paths
  • Table3: Comparison to MirrorNet [<a class="ref-link" id="c38" href="#r38">38</a>] on MSD test set
Download tables as Excel
Related work
  • In this section, we briefly review state-of-the-art methods from relevant fields, including semantic/scene/instance segmentation, salient object detection, specific region detection/segmentation, and single image reflection removal.

    Semantic/scene/instance segmentation. Semantic segmentation aims to segment and parse a given image into different regions associated with semantic categories of discrete objects. Scene segmentation further considers stuff when assigning a label for each pixel. Recently, great progress has been achieved benefited by the advances of deep neural networks. Based on fully convolutional networks (FCNs) [22], state-of-the-art model variants typically leverage multi-scale context aggregation or exploit more discriminative context to achieve high segmentation performance. For example, Chen et al [1] introduce an atrous spatial pyramid pooling (ASPP) to capture multi-scale context information. Zhao et al [46] employ a pyramid pooling module to aggregate local and global context. Ding et al [5] explore contextual contrasted features to boost the segmentation performance of small objects. Zhang et al [40] introduce a channel attention mechanism to capture global context. Fu et al [7] leverage channel- and spatial-wise non-local attention modules to capture contextual features with long-range dependencies. Huang et al [12] further propose a criss-cross attention module to efficiently capture information from long-range dependencies.
Funding
  • This work was supported in part by the National Natural Science Foundation of China, Grants 91748104, 61972067, 61632006, U1811463, U1908214, 61751203, and in part by the National Key Research and Development Program of China, Grants 2018AAA0102003 and 2018YFC0910506
Reference
  • Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI, 2017.
    Google ScholarFindings
  • Shuhan Chen, Xiuli Tan, Ben Wang, and Xuelong Hu. Reverse attention for salient object detection. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Ming-Ming Cheng, Niloy J Mitra, Xiaolei Huang, Philip HS Torr, and Shi-Min Hu. Global contrast based salient region detection. IEEE TPAMI, 2014.
    Google ScholarFindings
  • Zijun Deng, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Jing Qin, Guoqiang Han, and Pheng-Ann Heng. R3net: Recurrent residual refinement network for saliency detection. In IJCAI, 2018.
    Google ScholarLocate open access versionFindings
  • Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, and Gang Wang. Context contrasted feature and gated multiscale aggregation for scene segmentation. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, and David Wipf. A generic deep architecture for single image reflection removal and image smoothing. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. Dual attention network for scene segmentation. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Xiaofeng Han, Chuong Nguyen, Shaodi You, and Jianfeng Lu. Single image water hazard detection using fcn with reflection attention units. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip Torr. Deeply supervised salient object detection with short connections. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Xiaowei Hu, Lei Zhu, Chi-Wing Fu, Jing Qin, and PhengAnn Heng. Direction-aware spatial context features for shadow detection. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. Ccnet: Criss-cross attention for semantic segmentation. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Philipp Krahenbuhl and Vladlen Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In NIPS, 2011.
    Google ScholarLocate open access versionFindings
  • Anat Levin and Yair Weiss. User assisted separation of reflections from a single image using a sparsity prior. IEEE TPAMI, 2007.
    Google ScholarFindings
  • Yu Li and Michael S. Brown. Single image layer separation using relative smoothness. In CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • Zhengqi Li and Noah Snavely. Megadepth: Learning singleview depth prediction from internet photos. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, and Jianmin Jiang. A simple pooling-based design for realtime salient object detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Nian Liu and Junwei Han. Dhsnet: Deep hierarchical saliency network for salient object detection. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Nian Liu, Junwei Han, and Ming-Hsuan Yang. Picanet: Learning pixel-wise contextual attention for saliency detection. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregation network for instance segmentation. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Wei Liu, Andrew Rabinovich, and Alexander C Berg. Parsenet: Looking wider to see better. arXiv:1506.04579, 2015.
    Findings
  • Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, and Martin Jagersand. Basnet: Boundaryaware salient object detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: towards real-time object detection with region proposal networks. IEEE TPAMI, 2017.
    Google ScholarFindings
  • YiChang Shih, Dilip Krishnan, Fredo Durand, and William T. Freeman. Reflection removal using ghosting cues. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Jinming Su, Jia Li, Yu Zhang, Changqun Xia, and Yonghong Tian. Selectivity or invariance: Boundary-aware salient object detection. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Renjie Wan, Boxin Shi, Ling-Yu Duan, Ah-Hwee Tan, and Alex C. Kot. Crrn: Multi-scale guided concurrent reflection removal network. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Renjie Wan, Boxin Shi, Tan Ah Hwee, and Alex C Kot. Depth of field guided reflection removal. In ICIP, 2016.
    Google ScholarLocate open access versionFindings
  • Wenguan Wang, Jianbing Shen, Ming-Ming Cheng, and Ling Shao. An iterative and cooperative top-down and bottom-up inference network for salient object detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Kaixuan Wei, Jiaolong Yang, Ying Fu, David Wipf, and Hua Huang. Single image reflection removal exploiting misaligned training data and network enhancements. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Zhe Wu, Li Su, and Qingming Huang. Cascaded partial decoder for fast and accurate salient object detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Jie Yang, Dong Gong, Lingqiao Liu, and Qinfeng Shi. Seeing deeply and bidirectionally: A deep learning approach for single image reflection removal. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, and Kuiyuan Yang. Denseaspp for semantic segmentation in street scenes. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Xin Yang, Haiyang Mei, Ke Xu, Xiaopeng Wei, Baocai Yin, and Rynson W.H. Lau. Where is my mirror? In ICCV, 2019.
    Google ScholarFindings
  • Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. Context encoding for semantic segmentation. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Lu Zhang, Ju Dai, Huchuan Lu, You He, and Gang Wang. A bi-directional message passing model for salient object detection. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang, and Xiang Ruan. Amulet: Aggregating multi-level convolutional features for salient object detection. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Xuaner Zhang, Ren Ng, and Qifeng Chen. Single image reflection separation with perceptual losses. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Xiaoning Zhang, Tiantian Wang, Jinqing Qi, Huchuan Lu, and Gang Wang. Progressive attention guided recurrent network for salient object detection. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. Icnet for real-time semantic segmentation on high-resolution images. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, and Jiaya Jia. Psanet: Point-wise spatial attention network for scene parsing. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Jia-Xing Zhao, Jiang-Jiang Liu, Deng-Ping Fan, Yang Cao, Jufeng Yang, and Ming-Ming Cheng. Egnet:edge guidance network for salient object detection. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Ting Zhao and Xiangqian Wu. Pyramid feature attention network for saliency detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Quanlong Zheng, Cao Ying Qiao, Xiaotian, and Rynson W.H. Lau. Distraction-aware shadow detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Lei Zhu, Zijun Deng, Xiaowei Hu, Chi-Wing Fu, Xuemiao Xu, Jing Qin, and Pheng-Ann Heng. Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments