Segmenting Transparent Objects in the Wild

european conference on computer vision, pp. 696-711, 2020.

Cited by: 5|Views141
Weibo:
We apply four metrics that are widely used in semantic segmentation, salient object detection and shadow detection to benchmark the performance of transparent object segmentation

Abstract:

Transparent objects such as windows and bottles made by glass widely exist in the real world. Segmenting transparent objects is challenging because these objects have diverse appearance inherited from the image background, making them had similar appearance with their surroundings. Besides the technical difficulty of this task, only a f...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Transparent objects widely exist in the real world, such as bottles, vitrines, windows, walls and many others made by glass.
  • TOM-Net [2] has large data size of 178K images, all the images are generated by using computer graphics method by overlaying a transparent object on different background images, that is, the images are not real and out of the distribution of natural images.
  • TOM-Net provides 876 real images for test, but these images do not have manual annotations and evaluation performed by user study
Highlights
  • Transparent objects widely exist in the real world, such as bottles, vitrines, windows, walls and many others made by glass
  • Transparent object segmentation is important in computer vision, only a few previous datasets [1,2] were specially collected to explore this task and they have major drawbacks
  • To address the above issues, this paper proposes a novel large-scale dataset for transparent object segmentation, named Trans10K, containing 10,428 realworld images of transparent objects, each of which is manually labeled with segmentation mask
  • We apply four metrics that are widely used in semantic segmentation, salient object detection and shadow detection to benchmark the performance of transparent object segmentation
  • We present the Trans10K dataset, which is, to the best of our knowledge, the largest real dataset for transparent object segmentation
  • We benchmark 20 semantic segmentation algorithms on this novel dataset and shed light on what attributes are especially difficult for current methods
Methods
  • 4.1 Network Architecture that the boundary is easier than content to observe because it tends to have high contrast in the edge of transparent objects, which is consistent with human visual perception.
  • The authors can fuse boundary information at different levels for the regular stream
  • In this part, the authors repeatedly use BAM on C1, C2, C4 feature maps to show how boundary attention module works.
  • By using a high-quality boundary map for attention, the feature maps of regular stream can have higher weights on the boundary region, which can be viewed as a prior of transparent objects
  • This is consistent with human visual perception because the boundary is easier to locate than the content of a transparent object.
  • CGNet [29] HRNet [30] HardNet [31] DABNet [32] LEDNet [33] ICNet [8] BiSeNet [25] DenseASPP [11] DeepLabv3+R50 [14] FCN [13] OCNet [34] RefineNet [12] DeepLabv3+XP65 [14] DUNet [9] UNet [35] PSPNet [7] TransLab
Results
  • The authors apply four metrics that are widely used in semantic segmentation, salient object detection and shadow detection to benchmark the performance of transparent object segmentation.
  • Balance error rate (BER) is used from the shadow detection field.
  • It considers the unbalanced areas of transparent and non-transparent regions.
  • BER is used to evaluate binary predictions, here the authors change it to mean balance error rate to evaluate the two fine-grained transparent categories.
  • BER is used to evaluate binary predictions, here the authors change it to mean balance error rate to evaluate the two fine-grained transparent categories. , it is computed as:
Conclusion
  • For the boundary prediction stream, it faces an extremely imbalance of positive/negative samples.
  • The authors choose Dice Loss to supervise the training of the boundary stream.In this work, the authors present the Trans10K dataset, which is, to the best of the knowledge, the largest real dataset for transparent object segmentation.
  • The authors propose a boundary-aware algorithm, termed TransLab, to utilize the boundary prediction to improve the segmentation performance.
  • The authors plan to further design robust techniques to improve transparent object and instance segmentation
  • The authors will further build a dataset on transparent instance segmentation. the authors plan to further design robust techniques to improve transparent object and instance segmentation
Summary
  • Introduction:

    Transparent objects widely exist in the real world, such as bottles, vitrines, windows, walls and many others made by glass.
  • TOM-Net [2] has large data size of 178K images, all the images are generated by using computer graphics method by overlaying a transparent object on different background images, that is, the images are not real and out of the distribution of natural images.
  • TOM-Net provides 876 real images for test, but these images do not have manual annotations and evaluation performed by user study
  • Methods:

    4.1 Network Architecture that the boundary is easier than content to observe because it tends to have high contrast in the edge of transparent objects, which is consistent with human visual perception.
  • The authors can fuse boundary information at different levels for the regular stream
  • In this part, the authors repeatedly use BAM on C1, C2, C4 feature maps to show how boundary attention module works.
  • By using a high-quality boundary map for attention, the feature maps of regular stream can have higher weights on the boundary region, which can be viewed as a prior of transparent objects
  • This is consistent with human visual perception because the boundary is easier to locate than the content of a transparent object.
  • CGNet [29] HRNet [30] HardNet [31] DABNet [32] LEDNet [33] ICNet [8] BiSeNet [25] DenseASPP [11] DeepLabv3+R50 [14] FCN [13] OCNet [34] RefineNet [12] DeepLabv3+XP65 [14] DUNet [9] UNet [35] PSPNet [7] TransLab
  • Results:

    The authors apply four metrics that are widely used in semantic segmentation, salient object detection and shadow detection to benchmark the performance of transparent object segmentation.
  • Balance error rate (BER) is used from the shadow detection field.
  • It considers the unbalanced areas of transparent and non-transparent regions.
  • BER is used to evaluate binary predictions, here the authors change it to mean balance error rate to evaluate the two fine-grained transparent categories.
  • BER is used to evaluate binary predictions, here the authors change it to mean balance error rate to evaluate the two fine-grained transparent categories. , it is computed as:
  • Conclusion:

    For the boundary prediction stream, it faces an extremely imbalance of positive/negative samples.
  • The authors choose Dice Loss to supervise the training of the boundary stream.In this work, the authors present the Trans10K dataset, which is, to the best of the knowledge, the largest real dataset for transparent object segmentation.
  • The authors propose a boundary-aware algorithm, termed TransLab, to utilize the boundary prediction to improve the segmentation performance.
  • The authors plan to further design robust techniques to improve transparent object and instance segmentation
  • The authors will further build a dataset on transparent instance segmentation. the authors plan to further design robust techniques to improve transparent object and instance segmentation
Tables
  • Table1: Comparisons between Trans10K and previous transparent object datasets, where “Syn” represents synthetic images using computer graphics method, “Thing” represents small and movable objects, “Stuff” are large and fixed objects, and “MCC” denotes Mean Connected Components in each image. “MCC” can approximately represent the number of objects. It is reported to compare the complexity of datasets. We see that the train and validation sets in TOM-Net were synthesized and its test set does not have mask annotations. Trans10K is much more challenging than prior arts in terms of all characteristics presented in this table, such as large real data size, diverse object size (i.e. number of pixels), and large variations
  • Table2: Image statistics of Trans10K. MCC denotes Mean Connected Components in each image
  • Table3: Ablation study for different
  • Table4: Ablation study for different setloss functions of boundary stream. ting of Boundary Attention Module
  • Table5: Evaluated Semantic Segmentation methods. Sorted by FLOPs. Note that FLOPs is computed with one 512×512 image
Download tables as Excel
Related work
  • Semantic Segmentation. Most state-of-the-art algorithms for semantic segmentation are predominantly based on CNNs. Earlier approaches [13,15] transfer classification networks to fully convolutional networks (FCNs) for semantic segmentation, which is in an end-to-end training manner. Several works propose to use structured prediction modules such as conditional random fields (CRFs) on network output for improving the segmentation performance, especially around object boundaries. To avoid costly DenseCRF, the work of [16] uses fast domain transform filtering on network output while also predicting edge maps from intermediate CNN layers. More recently, dramatic improvements in performance and inference speed have been driven by new architectural designs. For example, PSPNet [7] and DeepLab [10,17] proposed a feature pyramid pooling module that incorporates multiscale context by aggregating features at multiples scales. Some works [18,19,20] propose modules that use learned pixel affinities for structured information propagation across intermediate CNN representations.
Reference
  • Xu, Y., Nagahara, H., Shimada, A., Taniguchi, R.: Transcut: Transparent object segmentation from a light-field image. In: ICCV. (2015)
    Google ScholarFindings
  • Chen, G., Han, K., Wong, K.K.: Tom-net: Learning transparent object matting from a single image. In: CVPR. (2018)
    Google ScholarFindings
  • Caesar, H., Uijlings, J., Ferrari, V.: Coco-stuff: Thing and stuff classes in context. In: CVPR. (2018)
    Google ScholarFindings
  • Everingham, M., Winn, J.: The pascal visual object classes challenge 2012 (voc2012) development kit. Pattern Analysis, Statistical Modelling and Computational Learning, Tech. Rep (2011)
    Google ScholarLocate open access versionFindings
  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR. (2016)
    Google ScholarFindings
  • Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: CVPR. (2017)
    Google ScholarFindings
  • Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR. (2017)
    Google ScholarFindings
  • Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. In: ECCV. (2018)
    Google ScholarFindings
  • Jin, Q., Meng, Z., Pham, T.D., Chen, Q., Wei, L., Su, R.: Dunet: A deformable network for retinal vessel segmentation. Knowledge-Based Systems (2019)
    Google ScholarLocate open access versionFindings
  • Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI (2017)
    Google ScholarLocate open access versionFindings
  • Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: Denseaspp for semantic segmentation in street scenes. In: CVPR. (2018)
    Google ScholarFindings
  • Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: CVPR. (2017)
    Google ScholarFindings
  • Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR. (2015)
    Google ScholarFindings
  • Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV. (2018)
    Google ScholarFindings
  • Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv (2014)
    Google ScholarLocate open access versionFindings
  • Chen, L.C., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In: CVPR. (2016)
    Google ScholarFindings
  • Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV. (2018)
    Google ScholarFindings
  • Gadde, R., Jampani, V., Kiefel, M., Kappler, D., Gehler, P.V.: Superpixel convolutional networks using bilateral inceptions. In: ECCV. (2016)
    Google ScholarFindings
  • Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M.H., Kautz, J.: Learning affinity via spatial propagation networks. In: NIPS. (2017)
    Google ScholarFindings
  • Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR. (2018)
    Google ScholarFindings
  • Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Duerig, T., et al.: The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv (2018)
    Google ScholarLocate open access versionFindings
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. (2016)
    Google ScholarFindings
  • Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: IC3DV. (2016)
    Google ScholarLocate open access versionFindings
  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. (2017)
    Google ScholarFindings
  • Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: ECCV. (2018)
    Google ScholarFindings
  • Liu, M., Yin, H.: Feature pyramid encoding network for real-time semantic segmentation. arXiv (2019)
    Google ScholarLocate open access versionFindings
  • Poudel, R.P., Bonde, U., Liwicki, S., Zach, C.: Contextnet: Exploring context and detail for semantic segmentation in real-time. arXiv (2018)
    Google ScholarLocate open access versionFindings
  • Poudel, R.P., Liwicki, S., Cipolla, R.: Fast-scnn: fast semantic segmentation network. arXiv (2019)
    Google ScholarLocate open access versionFindings
  • Wu, T., Tang, S., Zhang, R., Zhang, Y.: Cgnet: A light-weight context guided network for semantic segmentation. arXiv (2018)
    Google ScholarLocate open access versionFindings
  • Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recognition. arXiv (2019)
    Google ScholarLocate open access versionFindings
  • Chao, P., Kao, C.Y., Ruan, Y.S., Huang, C.H., Lin, Y.L.: Hardnet: A low memory traffic network. In: ICCV. (2019)
    Google ScholarFindings
  • Li, G., Yun, I., Kim, J., Kim, J.: Dabnet: Depth-wise asymmetric bottleneck for real-time semantic segmentation. arXiv (2019)
    Google ScholarLocate open access versionFindings
  • Wang, Y., Zhou, Q., Liu, J., Xiong, J., Gao, G., Wu, X., Latecki, L.J.: Lednet: A lightweight encoder-decoder network for real-time semantic segmentation. In: ICIP. (2019)
    Google ScholarFindings
  • Yuan, Y., Wang, J.: Ocnet: Object context network for scene parsing. arXiv (2018)
    Google ScholarLocate open access versionFindings
  • Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI. (2015)
    Google ScholarFindings
Your rating :
0

 

Tags
Comments