Learning Fast and Robust Target Models for Video Object Segmentation

CVPR, pp. 7404-7413, 2020.

Cited by: 1|Bibtex|Views42|Links
EI
Keywords:
target appearance modelframe rateConjugate Gradientappearance changesegmentation networkMore(7+)
Weibo:
We propose video object segmentation approach, integrating a light-weight but highly discriminative target appearance model and a segmentation network

Abstract:

Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time. The main difficulty is to effectively handle appearance changes and similar background objects, while maintaining accurate segmentation. Most previous approaches fine-tune segmentation networks ...More

Code:

Data:

0
Introduction
  • The problem of video object segmentation (VOS) has a variety of important applications, including object boundary estimation for grasping [1, 25], autonomous driving [38, 40], surveillance [10, 13] and video editing [33].
  • Fine-tuning is prone to overfit to a single view of the scene, while degrading generic segmentation functionality learned during offline training.
  • This limits performance in more challenging videos involving drastic appearance changes, occlusions and distractor objects [49].
  • The crucial fine-tuning step is not included in the offline training stage, which does not simulate the full inference procedure
Highlights
  • The problem of video object segmentation (VOS) has a variety of important applications, including object boundary estimation for grasping [1, 25], autonomous driving [38, 40], surveillance [10, 13] and video editing [33]
  • The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation
  • We propose video object segmentation approach, integrating a light-weight but highly discriminative target appearance model and a segmentation network
  • We find that despite its simplicity, a linear discriminative model is capable of generating robust target predictions
  • The segmentation network converts the predictions into high-quality object segmentations
  • Our method operates at high frame-rates and achieves state-of-the-art performance on the YouTubeVOS dataset and competitive results on DAVIS 2017 despite trained on limited data
Methods
  • The authors tackle the problem of predicting accurate segmentation masks of a target object, defined in the first frame of the video
  • This is addressed by constructing two network modules, D and S, designed for target modeling and segmentation respectively.
  • The target model D(x; w) takes features x as input and generates a coarse, but robust, segmentation output s = D(x; w) of the target object.
Results
  • The authors analyze how the amount of training data impacts the performance of the approach.
  • For this purpose the authors train the model on subsets of the YouTube-VOS training set containing 100%, 50%, 25% and 0% of the YouTube-VOS 2018 training split.
  • As shown in Table 4, the performance improves as the authors increase the amount of training data from 0 to 100 percent of the YouTubeVOS training split.
  • Already at 25 percent the approach outperforms recent methods such as AGAME [22]
Conclusion
  • The authors propose video object segmentation approach, integrating a light-weight but highly discriminative target appearance model and a segmentation network.
  • The authors find that despite its simplicity, a linear discriminative model is capable of generating robust target predictions.
  • The segmentation network converts the predictions into high-quality object segmentations.
  • The target model is efficiently trained during inference.
  • The authors' method operates at high frame-rates and achieves state-of-the-art performance on the YouTubeVOS dataset and competitive results on DAVIS 2017 despite trained on limited data.
Summary
  • Introduction:

    The problem of video object segmentation (VOS) has a variety of important applications, including object boundary estimation for grasping [1, 25], autonomous driving [38, 40], surveillance [10, 13] and video editing [33].
  • Fine-tuning is prone to overfit to a single view of the scene, while degrading generic segmentation functionality learned during offline training.
  • This limits performance in more challenging videos involving drastic appearance changes, occlusions and distractor objects [49].
  • The crucial fine-tuning step is not included in the offline training stage, which does not simulate the full inference procedure
  • Methods:

    The authors tackle the problem of predicting accurate segmentation masks of a target object, defined in the first frame of the video
  • This is addressed by constructing two network modules, D and S, designed for target modeling and segmentation respectively.
  • The target model D(x; w) takes features x as input and generates a coarse, but robust, segmentation output s = D(x; w) of the target object.
  • Results:

    The authors analyze how the amount of training data impacts the performance of the approach.
  • For this purpose the authors train the model on subsets of the YouTube-VOS training set containing 100%, 50%, 25% and 0% of the YouTube-VOS 2018 training split.
  • As shown in Table 4, the performance improves as the authors increase the amount of training data from 0 to 100 percent of the YouTubeVOS training split.
  • Already at 25 percent the approach outperforms recent methods such as AGAME [22]
  • Conclusion:

    The authors propose video object segmentation approach, integrating a light-weight but highly discriminative target appearance model and a segmentation network.
  • The authors find that despite its simplicity, a linear discriminative model is capable of generating robust target predictions.
  • The segmentation network converts the predictions into high-quality object segmentations.
  • The target model is efficiently trained during inference.
  • The authors' method operates at high frame-rates and achieves state-of-the-art performance on the YouTubeVOS dataset and competitive results on DAVIS 2017 despite trained on limited data.
Tables
  • Table1: Ablative study on a validation split of 300 sequences from the YouTube-VOS train set. We analyze the different components of our approach, where D and S denote the target model and segmentation network respectively. Further, “Update” indicates if the target model update is enabled. Our target model D outperforms the Base net and is comparable to first-frame fine-tuning (“F.-T.”) even with updates are disabled. Further, the segmentation network significantly improves the raw predictions from the target model D. Finally, the best performance is obtained when additionally updating target model D
  • Table2: State-of-the-art comparison on the large-scale YouTubeVOS validation dataset, containing 474 videos. The results of our approach were obtained through the official evaluation server. We report the mean Jaccard (J ) and boundary (F ) scores for object classes that are seen and unseen in the training set, along with the overall mean (G). “seg” and “synth” indicate whether pre-trained segmentation models or additional data has been used during training. Our approaches achieve superior performance to methods that only trains on the YouTube-VOS train split, while operating at high frame-rates. Furthermore, Ours-fast obtains the highest frame-rates while performing comparable to state-of-the-art
  • Table3: State-of-the-art comparison on DAVIS 2017 and DAVIS 2016 validation sets. The columns with “yv”, “seg”, and “synth” indicate whether YouTube-VOS, pre-trained segmentation models or additional synthetic data has been used during training. The best and second best entries are shown in red and blue respectively. In addition to Ours and Ours-fast, we report the results of our approach when trained on only DAVIS 2017, in Ours (DV17). Our approach outperform compared methods with practical framerates. Furthermore, we achieve competitive results when trained with only DAVIS 2017, owing to our discriminative target model
  • Table4: YouTubeVos 2018 test-dev results for different amount of training data, sample. Ours with 100% data is the same instance as in the comparison in Table 2 in the main paper. The Ours D-only is our approach without the segmentation network as described in Section 5.1 in the main paper. It thus requires no training data at all
  • Table5: Distribution of time spent on steps in the frame loop of algorithm 1. The target prediction (step 5) is wrapped into “Other”
  • Table6: The influence on mean J with varying |M0| during inference. We test two variants, trained on either YouTubeVOS only (yt) or both YouTubeVOS and DAVIS2017 (yt+dv17). Results shown are from evaluating on our YoutubeVOS validation split (ytv) and the DAVS2017 validation split (dvv)
Download tables as Excel
Related work
  • The task of video object segmentation has seen extensive study and rapid development in recent years, largely driven by the introduction and evolution of benchmarks such as DAVIS [37] and YouTube-VOS [49]. First-frame fine-tuning: Most state-of-the-art approaches train a segmentation network offline, and then fine-tune it on the first frame [4, 30, 36, 48] to learn the target-specific appearance. This philosophy was extended [44] by additionally fine-tuning on subsequent video frames. Other approaches [8, 19, 29] further integrate optical flow as an additional cue. While obtaining impressive results on the DAVIS 2016 dataset, the extensive fine-tuning leads to impractically long run-times. Furthermore, such extensive fine-tuning is prone to overfitting, a problem only partially addressed by heavy data augmentation [23]. Non-causal methods: Another line of research approaches the VOS problem by allowing non-causal processing [2, 9, 21, 27]. In this work, we focus on the causal setting in order to accommodate real-time applications. Mask propagation: Several recent methods [22, 33, 34, 36, 47, 50] employ a mask-propagation module to improve spatio-temporal consistency of the segmentation. In [36], the model is learned offline to predict the target mask through refinement of the previous frame’s segmentation output. To further avoid first-frame fine-tuning, some approaches [33, 50] concatenate the current frame features with the previous mask and a target representation generated in the first frame. Unlike these methods, we do not explicitly enforce spatio-temporal consistency through maskpropagation. Instead, we use previous segmentation masks as training data for the discriminative model. Feature matching: Recent methods [6, 20, 33, 34, 45, 46, 47] incorporate feature matching to locate the target object. Rather than fine-tuning the network on the first frame, these methods first construct appearance models from features corresponding to the initial target labels. Features from incoming frames are then classified using techniques inspired by classical clustering methods [6, 22] or feature matching [20, 45, 47]. In [34], a dynamic memory is used to combine feature matching from multiple previous frames. Tracking: Efficient online learning of discriminative target-specific appearance models has been explored in visual tracking [15, 17]. Recently, optimization-based trackers [3, 11, 12] have achieved impressive results on benchmarks. These methods train convolution filters using efficient optimization to discriminate between target and background. The close relation between the two problem domains is made explicit in [7], where object trackers are used as external components to locate the target. Gauss-Newton has previously been used in object segmentation [42] for pose estimation of known object shapes. In contrast, we do not employ off-the-shelf trackers to predict the target or rely on target pose estimation. Instead we take inspiration from the optimization-based learning of a discriminative model, in order to capture the target object appearance.
Funding
  • Acknowledments: This work was supported by the ELLIIT Excellence Center at Linkoping-Lund for Information Technology, Autonomous Systems and Software Program (WASP) and the SSF project Symbicloud
Reference
  • Peter K Allen, Aleksandar Timcenko, Billibon Yoshimi, and Paul Michelman. Automated tracking and grasping of a moving object with a robotic hand-eye system. IEEE Transactions on Robotics and Automation, 9(2):152–165, 1993. 1
    Google ScholarLocate open access versionFindings
  • Linchao Bao, Baoyuan Wu, and Wei Liu. Cnn in mrf: Video object segmentation via inference in a cnn-based higherorder spatio-temporal mrf. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5977–5986, 2018. 2
    Google ScholarLocate open access versionFindings
  • Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. Learning discriminative model prediction for tracking. In IEEE/CVF International Conference on Computer Vision, pages 6181–6190, 2019. 2
    Google ScholarLocate open access versionFindings
  • Sergi Caelles, K-K Maninis, Jordi Pont-Tuset, Laura LealTaixe, Daniel Cremers, and Luc Van Gool. One-shot video object segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5320–5329. IEEE, 2017. 1, 2, 6, 7, 11
    Google ScholarLocate open access versionFindings
  • Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2018. 4
    Google ScholarLocate open access versionFindings
  • Yuhua Chen, Jordi Pont-Tuset, Alberto Montes, and Luc Van Gool. Blazingly fast video object segmentation with pixel-wise metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1189–1198, 2018. 2
    Google ScholarLocate open access versionFindings
  • J. Cheng, Y.-H. Tsai, W.-C. Hung, S. Wang, and M.-H. Yang. Fast and accurate online video object segmentation via tracking parts. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 2, 7
    Google ScholarLocate open access versionFindings
  • Jingchun Cheng, Yi-Hsuan Tsai, Shengjin Wang, and MingHsuan Yang. Segflow: Joint learning for video object segmentation and optical flow. In 2017 IEEE International Conference on Computer Vision, pages 686–695. IEEE, 2017. 2
    Google ScholarLocate open access versionFindings
  • Hai Ci, Chunyu Wang, and Yizhou Wang. Video object segmentation by learning location-sensitive embeddings. In European Conference on Computer Vision, pages 524–53Springer, 2018. 2
    Google ScholarLocate open access versionFindings
  • I Cohen and G Medioni. Detecting and tracking moving objects for video surveillance. In Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), volume 2, pages 319– 325. IEEE, 1999. 1
    Google ScholarLocate open access versionFindings
  • Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. ATOM: Accurate tracking by overlap maximization. In IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2, 3
    Google ScholarLocate open access versionFindings
  • Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. ECO: Efficient convolution operators for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6638–6646, 2017. 2
    Google ScholarLocate open access versionFindings
  • Adam Erdelyi, Tibor Barat, Patrick Valet, Thomas Winkler, and Bernhard Rinner. Adaptive cartooning for privacy protection in camera networks. In 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 44–49. IEEE, 2014. 1
    Google ScholarLocate open access versionFindings
  • M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1):98–136, Jan. 2015. 6
    Google ScholarLocate open access versionFindings
  • Sam Hare, Stuart Golodetz, Amir Saffari, Vibhav Vineet, Ming-Ming Cheng, Stephen L Hicks, and Philip HS Torr. Struck: Structured output tracking with kernels. IEEE transactions on pattern analysis and machine intelligence, 38(10):2096–2109, 2016. 2
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 205
    Google ScholarLocate open access versionFindings
  • Joao F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. High-speed tracking with kernelized correlation filters. IEEE transactions on pattern analysis and machine intelligence, 37(3):583–596, 2015. 2
    Google ScholarLocate open access versionFindings
  • Magnus Rudolph Hestenes and Eduard Stiefel. Methods of conjugate gradients for solving linear systems, volume 49. NBS Washington, DC, 1952. 4
    Google ScholarFindings
  • Ping Hu, Gang Wang, Xiangfei Kong, Jason Kuen, and YapPeng Tan. Motion-guided cascaded refinement network for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1400–1409, 2018. 2, 7
    Google ScholarLocate open access versionFindings
  • Yuan-Ting Hu, Jia-Bin Huang, and Alexander G Schwing. Videomatch: Matching based video object segmentation. In European Conference on Computer Vision, pages 56–73. Springer, 2018. 1, 2
    Google ScholarLocate open access versionFindings
  • Won-Dong Jang and Chang-Su Kim. Online video object segmentation via convolutional trident network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5849–5858, 2017. 2
    Google ScholarLocate open access versionFindings
  • Joakim Johnander, Martin Danelljan, Emil Brissman, Fahad Shahbaz Khan, and Michael Felsberg. A generative appearance model for end-to-end video object segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 1, 2, 6, 7, 8, 11
    Google ScholarLocate open access versionFindings
  • Anna Khoreva, Rodrigo Benenson, Eddy Ilg, Thomas Brox, and Bernt Schiele. Lucid data dreaming for video object segmentation. International Journal of Computer Vision, 127(9):1175–1197, Sep 2019. 2
    Google ScholarLocate open access versionFindings
  • Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014. 4
    Google ScholarLocate open access versionFindings
  • Hedvig Kjellstrom, Javier Romero, and Danica Kragic. Visual recognition of grasps for human-to-robot mapping. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3192–3199. IEEE, 2008. 1
    Google ScholarLocate open access versionFindings
  • Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pfugfelder, Luka Cehovin Zajc, Tomas Vojir, Goutam Bhat, Alan Lukezic, Abdelrahman Eldesokey, Gustavo Fernandez, and et al. The sixth visual object tracking vot2018 challenge results. In ECCV workshop, 2018. 2
    Google ScholarLocate open access versionFindings
  • Xiaoxiao Li and Chen Change Loy. Video object segmentation with joint re-identification and attention-aware mask propagation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 90–105, 2018. 2
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 2, 6
    Google ScholarLocate open access versionFindings
  • Jonathon Luiten, Paul Voigtlaender, and Bastian Leibe. Premvos: Proposal-generation, refinement and merging for video object segmentation. In Asian Conference on Computer Vision, pages 565–580. Springer, 2018. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • Kevis-Kokitsi Maninis, Sergi Caelles, Yuhua Chen, Jordi Pont-Tuset, Laura Leal-Taixe, Daniel Cremers, and Luc Van Gool. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018. 1, 2, 7, 8
    Google ScholarLocate open access versionFindings
  • Donald W Marquardt. An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics, 11:431–441, 1963. 3
    Google ScholarLocate open access versionFindings
  • Andrew Y Ng and Michael I Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in neural information processing systems, pages 841–848, 2002. 2
    Google ScholarLocate open access versionFindings
  • Seoung Wug Oh, Joon-Young Lee, Kalyan Sunkavalli, and Seon Joo Kim. Fast video object segmentation by referenceguided mask propagation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7376– 7385. IEEE, 2018. 1, 2, 5, 7, 8
    Google ScholarLocate open access versionFindings
  • Seoung Wug Oh, Joon-Young Lee, Ning Xu, and Seon Joo Kim. Video object segmentation using space-time memory networks. Proceedings of the IEEE International Conference on Computer Vision, 2019. 1, 2, 6, 7
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017. 5
    Google ScholarFindings
  • Federico Perazzi, Anna Khoreva, Rodrigo Benenson, Bernt Schiele, and Alexander Sorkine-Hornung. Learning video object segmentation from static images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2663–2672, 2017. 1, 2
    Google ScholarLocate open access versionFindings
  • F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung. A benchmark dataset and evaluation methodology for video object segmentation. In Computer Vision and Pattern Recognition, 2016. 2, 5, 7
    Google ScholarLocate open access versionFindings
  • German Ros, Sebastian Ramos, Manuel Granados, Amir Bakhtiary, David Vazquez, and Antonio M Lopez. Visionbased offline-online perception paradigm for autonomous driving. In 2015 IEEE Winter Conference on Applications of Computer Vision, pages 231–2IEEE, 2015. 1
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. 2, 5
    Google ScholarLocate open access versionFindings
  • Khaled Saleh, Mohammed Hossny, and Saeid Nahavandi. Kangaroo vehicle collision detection using deep semantic segmentation convolutional neural network. In 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pages 1–7. IEEE. 1
    Google ScholarLocate open access versionFindings
  • Alexandru Telea. An image inpainting technique based on the fast marching method. Journal of graphics tools, 9(1):23–34, 2004. 11
    Google ScholarLocate open access versionFindings
  • Henning Tjaden, Ulrich Schwanecke, Elmar Schomer, and Daniel Cremers. A region-based gauss-newton approach to real-time monocular multiple object tracking. IEEE transactions on pattern analysis and machine intelligence, 2018. 2
    Google ScholarFindings
  • Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques, and Xavier Giro-i Nieto. Rvos: Endto-end recurrent network for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5277–5286, 2019. 6, 7, 11
    Google ScholarLocate open access versionFindings
  • Paul Voigtlaender and Bastian Leibe. Online adaptation of convolutional neural networks for video object segmentation. In BMVC, 2017. 2, 3, 6, 7, 11
    Google ScholarLocate open access versionFindings
  • Paul Voigtlaender and Bastian Leibe. Feelvos: Fast end-toend embedding learning for video object segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 1, 2, 7, 8
    Google ScholarLocate open access versionFindings
  • Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. Tracking emerges by colorizing videos. In European Conference on Computer Vision, pages 402–419.
    Google ScholarLocate open access versionFindings
  • Ziqin Wang, Jun Xu, Li Liu, Fan Zhu, and Ling Shao. Ranet: Ranking attention network for fast video object segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 3978–3987, 2019. 2, 7
    Google ScholarLocate open access versionFindings
  • Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, and Thomas Huang. Youtube-vos: Sequence-to-sequence video object segmentation. In European Conference on Computer Vision. Springer, 2018. 1, 2, 6, 7
    Google ScholarLocate open access versionFindings
  • Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, and Thomas Huang. Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327, 2018. 1, 2, 5, 6
    Findings
  • Linjie Yang, Yanran Wang, Xuehan Xiong, Jianchao Yang, and Aggelos K Katsaggelos. Efficient video object segmentation via network modulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6499–6507, 2018. 2, 7
    Google ScholarLocate open access versionFindings
  • Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Learning a discriminative feature network for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1857–1866, 2018. 4
    Google ScholarLocate open access versionFindings
  • Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2881–2890, 2017. 4
    Google ScholarLocate open access versionFindings
  • 6. Initial sample generation
    Google ScholarFindings
  • 7. Detailed Quantitative Results
    Google ScholarFindings
  • 4. Feature extraction
    Google ScholarFindings
  • 9. Target model update training 65.5
    Google ScholarFindings
Your rating :
0

 

Tags
Comments