Towards Universal Object Detection by Domain Attention

CVPR, 2019.

Cited by: 34|Bibtex|Views60|Links
EI
Keywords:
universal object detection benchmarkfully connectedmulti task learninghuman faceregion proposal networkMore(17+)
Weibo:
We proposed a universal detector that requires no prior domain knowledge, consisting of a single network that is active for all tasks

Abstract:

Despite increasing efforts on universal representations for visual recognition, few have addressed object detection. In this paper, we develop an effective and efficient universal object detection system that is capable of working on various image domains, from human faces and traffic signs to medical CT images. Unlike multi-domain models...More

Code:

Data:

0
Introduction
  • There has been significant progress in object detection in recent years [11, 44, 2, 26, 13, 3], powered by the availability of challenging and diverse object detection datasets, e.g. PASCAL VOC [6], COCO [27], KITTI [9], WiderFace [58], etc.
  • Existing detectors are usually domainspecific, e.g. trained and tested on a single dataset
  • This is partly due to the fact that object detection datasets are diverse and there is a nontrivial domain shift between them.
  • High detection performance requires a detector specialized on the target dataset.
  • This poses a significant problem for practical applications, which are not usually restricted to any one of the
Highlights
  • There has been significant progress in object detection in recent years [11, 44, 2, 26, 13, 3], powered by the availability of challenging and diverse object detection datasets, e.g. PASCAL VOC [6], COCO [27], KITTI [9], WiderFace [58], etc
  • We introduce a domain attention module inspired by squeeze and excitation to make data-driven domain assignments of network activations, for the more challenging problem of universal object detection
  • The domain-attentive universal detector (“universal+domain attention”) improves baseline performance by 4.4 points with a 5-fold parameter decrease. It has large performance gains (>5 points) on DeepLesion, Comic, and Clipart. This is because Comic/Clipart contain underpopulated classes, greatly benefiting from information leveraged from other domains
  • To investigate what was learned by the domain attention module of Figure 5 (b), we show the soft assignments of each dataset, averaged over its validation set, in Figure 6
  • We proposed a universal detector that requires no prior domain knowledge, consisting of a single network that is active for all tasks
  • The proposed detector achieves domain sensitivity through a novel data-driven domain adaptation module and was shown to outperform multiple universal/multi-domain detectors on a newly established benchmark, and even individual detectors optimized for a single task
Methods
  • The authors used a PyTorch implementation [57] of the Faster R-CNN with the SE-ResNet-50 [15] pretrained on ImageNet, as the backbone for all detectors.
  • As is common for detection, the first convolutional layer, the first residual block and all BN layers are frozen, during training.
  • These settings were used in all experiments, unless otherwise noted.
  • Both multi-domain and universal detectors were trained on all domains of interest simultaneously
Results
  • Results on the full benchmark

    Table 3 presents results on the full benchmark. The settings are as above, but the authors used 10 epochs with learning rate 0.1, and 4 epochs with 0.01 on 8 GPUs, each holding 2 images.
  • Results on the full benchmark.
  • Table 3 presents results on the full benchmark.
  • The domain-attentive universal detector (“universal+DA”) improves baseline performance by 4.4 points with a 5-fold parameter decrease.
  • It has large performance gains (>5 points) on DeepLesion, Comic, and Clipart.
  • For the universal detector, joint training is not always beneficial
  • This shows the importance of domain sensitivity for universal detection
Conclusion
  • The authors have investigated the unexplored and challenging problem of universal/multi-domain object detection.
  • The authors proposed a universal detector that requires no prior domain knowledge, consisting of a single network that is active for all tasks.
  • The proposed detector achieves domain sensitivity through a novel data-driven domain adaptation module and was shown to outperform multiple universal/multi-domain detectors on a newly established benchmark, and even individual detectors optimized for a single task
Summary
  • Introduction:

    There has been significant progress in object detection in recent years [11, 44, 2, 26, 13, 3], powered by the availability of challenging and diverse object detection datasets, e.g. PASCAL VOC [6], COCO [27], KITTI [9], WiderFace [58], etc.
  • Existing detectors are usually domainspecific, e.g. trained and tested on a single dataset
  • This is partly due to the fact that object detection datasets are diverse and there is a nontrivial domain shift between them.
  • High detection performance requires a detector specialized on the target dataset.
  • This poses a significant problem for practical applications, which are not usually restricted to any one of the
  • Methods:

    The authors used a PyTorch implementation [57] of the Faster R-CNN with the SE-ResNet-50 [15] pretrained on ImageNet, as the backbone for all detectors.
  • As is common for detection, the first convolutional layer, the first residual block and all BN layers are frozen, during training.
  • These settings were used in all experiments, unless otherwise noted.
  • Both multi-domain and universal detectors were trained on all domains of interest simultaneously
  • Results:

    Results on the full benchmark

    Table 3 presents results on the full benchmark. The settings are as above, but the authors used 10 epochs with learning rate 0.1, and 4 epochs with 0.01 on 8 GPUs, each holding 2 images.
  • Results on the full benchmark.
  • Table 3 presents results on the full benchmark.
  • The domain-attentive universal detector (“universal+DA”) improves baseline performance by 4.4 points with a 5-fold parameter decrease.
  • It has large performance gains (>5 points) on DeepLesion, Comic, and Clipart.
  • For the universal detector, joint training is not always beneficial
  • This shows the importance of domain sensitivity for universal detection
  • Conclusion:

    The authors have investigated the unexplored and challenging problem of universal/multi-domain object detection.
  • The authors proposed a universal detector that requires no prior domain knowledge, consisting of a single network that is active for all tasks.
  • The proposed detector achieves domain sensitivity through a novel data-driven domain adaptation module and was shown to outperform multiple universal/multi-domain detectors on a newly established benchmark, and even individual detectors optimized for a single task
Tables
  • Table1: The dataset details, the domain-specific hyperparameters and the performance of the single-domain detectors. “T/V/T” means train/val/test, “size” the shortest side of inputs, BS RPN batch size, and S/R anchor “scales/aspect ratios”
  • Table2: The comparison on multi-domain detection. † denotes fixed assignment. “time” is the relatively run-times on the five datasets when the domain is unknown
  • Table3: Overall results on the full universal object detection benchmark (11 datasets)
  • Table4: The effect of SE adapters number
  • Table5: The comparison with official evaluation on Pascal VOC, KITTI, DeepLesion, Clipart, Watercorlor, Comic and WiderFace
Download tables as Excel
Related work
  • Object Detection: The two stage detection framework of the R-CNN [12], Fast R-CNN [11] and Faster R-CNN [44] detectors has achieved great success in recent years. Many works have expanded this base architecture. For example, MS-CNN [2] and FPN [26] built a feature pyramid to effectively detect objects of various scales; the R-FCN [4] proposed a position-sensitive pooling to achieve further speedups; and the Cascade R-CNN [3] introduced a multi-stage cascade for high quality object detection. In parallel, singlestage object detectors, such as YOLO [42] and SSD [29], became popular for their fairly good performance and high speed. However, none of these detectors could reach high detection performance on more than one dataset/domain without finetuning. In the pre-deep learning era, [23] proposed a universal DPM [8] detector, by adding dataset specific biases to the DPM. But this solution is limited since DPM is not comparable to deep learning detectors. Multi-Task Learning: Multi-task learning (MTL) investigates how to jointly learn multiple tasks simultaneously, assuming a single input domain. Various multi-task networks [25, 62, 13, 28, 50, 63] have been proposed for joint solution of tasks such as object recognition, object detection, segmentation, edge detection, human pose, depth, action recognition, etc., by leveraging information sharing across tasks. However, the sharing is not always beneficial, sometimes hurting performance [7, 22]. To address this, [32] proposed a cross-stitch unit, which combines tasks of different types, eliminating the need to search through several architectures on a per task basis. [62] studied the common structure and relationships of several different tasks. Multi-Domain Learning/Adaptation: Multi-domain learning (MDL) addresses the learning of representations for multiple domains, known a priori [20, 36]. It uses a combination of parameters that are shared across domains and domain-specific parameters. The latter are adaptation parameters, inspired by works on domain adaptation [38, 30, 46, 31], where a model learned from a source domain is adapted to a target domain. [1] showed that multi-domain learning is feasible by simply adding domain-specific BN layers to an otherwise shared network. [40] learned multiple visual domains with residual adapters, while [41] empirically studied efficient parameterizations. However, they build on BN layers and are not suitable for detection, due to the batch constraints of detector training. Instead, we propose an alternative SE adapters, inspired by “Squeeze-and-Excitation” [15], to solve this problem. Attention Module: [49] proposed a self-attention module for machine translation, and similarly, [51] proposed a nonlocal network for video classification, based on a spacetime dependency/attention mechanism. [15] focused on channel relationships, introducing the SE module to adaptatively recalibrate channel-wise feature responses, which achieved good results on ImageNet recognition. In this work, we introduce a domain attention module inspired by SE to make data-driven domain assignments of network activations, for the more challenging problem of universal object detection.
Funding
  • This work was partially funded by NSF awards IIS-1546305 and IIS-1637941, a gift from 12 Sigma Technologies, and NVIDIA GPU donations
Study subjects and analysis
diverse datasets: 11
In the proposed universal detector, all parameters and computations are shared across domains, and a single network processes all domains all the time. Experiments, on a newly established universal object detection benchmark of 11 diverse datasets, show that the proposed detector outperforms a bank of individual detectors, a multi-domain detector, and a baseline universal detector, with a 1.3× parameter increase over a single-domain baseline detector. The code and benchmark are available at http://www.svcl.ucsd.edu/projects/universal-detection/

diverse object detection datasets: 11
In this work, we consider the design of an object detector capable of operating over multiple domains. We begin by establishing a new universal object detection benchmark, denoted as UODB, consisting of 11 diverse object detection datasets (see Figure 1). This is significantly more challenging than the Decathlon [40] benchmark for multi-domain recognition

datasets: 11
Universal Object Detection Benchmark. To train and evaluate universal/multi-domain object detection systems, we established a new universal object detection benchmark (UODB) of 11 datasets: Pascal VOC [6], WiderFace [58], KITTI [9], LISA [33], DOTA [53], COCO [27], Watercolor [17], Clipart [17], Comic [17], Kitchen [10] and DeepLesions [55]. This set includes the popular VOC [6] and COCO [27], composed of images of everyday objects, e.g. bikes, humans, animals, etc

datasets: 11
It leads to the multi-domain detector of Figure 2 (b). Compared to Figure 2 (a), this model is up to 5 times smaller, while achieving better overall performance across the 11 datasets. 4

datasets: 11
In the literature, where detectors are tested on a single domain, these are tuned to the target dataset, for best performance. This is difficult, and very tedious, to do over the 11 datasets now considered. We use the same hyperparameters across datasets, except when this is critical for performance and relatively easy to do, e.g. the choice of anchors

datasets: 5
Table 2 compares the multi-domain object detection performance of all architectures of Figure 2. For simplicity, only five datasets (VOC, KITTI, WiderFace, LISA and Kitchen) were used in this section. The table confirms that the adaptive multi-domain detector of Section 3.3 (“adaptive”) is light-weight, only adding ∼11M parameters to the Faster R-CNN over the five datasets

datasets: 5
For simplicity, only five datasets (VOC, KITTI, WiderFace, LISA and Kitchen) were used in this section. The table confirms that the adaptive multi-domain detector of Section 3.3 (“adaptive”) is light-weight, only adding ∼11M parameters to the Faster R-CNN over the five datasets. Nevertheless, it outperforms the much more expensive single-domain detector bank by 0.7 points

datasets: 5
This (denoted “universal+DA†”) caused a performance drop of 0.5 point. Finally, Table 2 shows the relative run-times of all methods on the five datasets, when the domain is unknown. It can be seen that “universal+DA” is about 4× faster than the multi-domain detectors (“singledomain” and “adaptive”) and only 1.33× slower than “universal”

datasets: 5
Table 4 summarizes how the performance of the domain attentive universal detector depends on N. For simplicity, we again use 5 datasets in this experiment. For a single adapter, the DA module reduces to the standard SE module, and the domain attentive universal detector to the universal detector

datasets: 11
A comparison of the first and the last blocks of each residual stage, e.g. “DA 4 1” v.s. “DA 4 6”, shows that the latter are much less domain sensitive than the former, suggesting that they could be made universal. To test this hypothesis, we trained a model with only 6 SE adapters for the 11 datasets, and only in the first and middle blocks, e.g. “DA 4 1” and “DA 4 3”. This model, “universal+DA*”, achieved the best performance with much less parameters than the “universal+DA” detector of 11 adapters

datasets: 11
Official evaluation. Since, to the best of our knowledge, this is the first work to explore universal/multi-domain object detection on 11 datasets, there is no literature for a direct comparison. Instead, we compared the “universal+DA*” detector of Table 3 to the literature using the official evaluation for each dataset

datasets: 11
On the other hand, on DeepLesion and CrossDomain (Clipart, Comic and Watercolor), see Table 5c and 5d respectively, the domain attentive universal detector significantly outperformed the stateof-the-art. Overall, these results show that a single detector, which operates on 11 datasets, is competitive with single-domain detectors in highly researched datasets, such as VOC or KITTI, and substantially better than the stateof-the-art in less explored domains. This is achieved with a relatively minor increase in parameters, vastly smaller than that needed to deploy 11 single task detectors

datasets: 5
The dataset details, the domain-specific hyperparameters and the performance of the single-domain detectors. “T/V/T” means train/val/test, “size” the shortest side of inputs, BS RPN batch size, and S/R anchor “scales/aspect ratios”. The comparison on multi-domain detection. † denotes fixed assignment. “time” is the relatively run-times on the five datasets when the domain is unknown. Overall results on the full universal object detection benchmark (11 datasets)

datasets: 11
The comparison on multi-domain detection. † denotes fixed assignment. “time” is the relatively run-times on the five datasets when the domain is unknown. Overall results on the full universal object detection benchmark (11 datasets). The effect of SE adapters number

Reference
  • Hakan Bilen and Andrea Vedaldi. Universal representations: The missing link between faces, text, planktons, and cat breeds. arXiv preprint arXiv:1701.07275, 2017.
    Findings
  • Zhaowei Cai, Quanfu Fan, Rogerio S Feris, and Nuno Vasconcelos. A unified multi-scale deep convolutional neural network for fast object detection. In ECCV, pages 354–370, 2016.
    Google ScholarLocate open access versionFindings
  • Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-fcn: Object detection via region-based fully convolutional networks. In NeurIPS, pages 379–387, 2016.
    Google ScholarLocate open access versionFindings
  • Mark Dredze, Alex Kulesza, and Koby Crammer. Multidomain learning by confidence-weighted parameter combination. Machine Learning, 79(1-2):123–149, 2010.
    Google ScholarLocate open access versionFindings
  • Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111(1):98–136, 2015.
    Google ScholarLocate open access versionFindings
  • Theodoros Evgeniou, Charles A Micchelli, and Massimiliano Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6(Apr):615–637, 2005.
    Google ScholarLocate open access versionFindings
  • Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, 32(9):1627–1645, 2010.
    Google ScholarLocate open access versionFindings
  • Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, pages 3354–3361, 2012.
    Google ScholarLocate open access versionFindings
  • Georgios Georgakis, Md Alimoor Reza, Arsalan Mousavian, Phi-Hung Le, and Jana Kosecka. Multiview rgb-d dataset for object instance detection. arXiv preprint arXiv:1609.07826, 2016.
    Findings
  • Ross Girshick. Fast r-cnn. In ICCV, pages 1440–1448, 2015.
    Google ScholarLocate open access versionFindings
  • Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pages 580–587, 2014.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In ICCV, pages 2980–2988, 2017.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507, 7, 2017.
    Findings
  • Peiyun Hu and Deva Ramanan. Finding tiny faces. In CVPR, pages 1522–1530, 2017.
    Google ScholarLocate open access versionFindings
  • Naoto Inoue, Ryosuke Furuta, Toshihiko Yamasaki, and Kiyoharu Aizawa. Cross-domain weakly-supervised object detection through progressive domain adaptation. In CVPR, pages 5001–5009, 2018.
    Google ScholarLocate open access versionFindings
  • Laurent Itti and Pierre Baldi. A principled approach to detecting surprising events in video. In CVPR, volume 1, pages 631–637, 2005.
    Google ScholarLocate open access versionFindings
  • Wei Jiang, Eric Zavesky, Shih-Fu Chang, and Alex Loui. Cross-domain learning methods for high-level visual concept classification. In ICIP, pages 161–164. IEEE, 2008.
    Google ScholarLocate open access versionFindings
  • Mahesh Joshi, William W Cohen, Mark Dredze, and Carolyn P Rose. Multi-domain learning: when do domains matter? In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1302–1312. Association for Computational Linguistics, 2012.
    Google ScholarLocate open access versionFindings
  • Lukasz Kaiser, Aidan N Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, and Jakob Uszkoreit. One model to learn them all. arXiv preprint arXiv:1706.05137, 2017.
    Findings
  • Tsuyoshi Kato, Hisashi Kashima, Masashi Sugiyama, and Kiyoshi Asai. Multi-task learning via conic programming. In NeurIPS, pages 737–744, 2008.
    Google ScholarLocate open access versionFindings
  • Aditya Khosla, Tinghui Zhou, Tomasz Malisiewicz, Alexei A Efros, and Antonio Torralba. Undoing the damage of dataset bias. In ECCV, pages 158–171.
    Google ScholarLocate open access versionFindings
  • Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. Learning to discover cross-domain relations with generative adversarial networks. arXiv preprint arXiv:1703.05192, 2017.
    Findings
  • Iasonas Kokkinos. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR, page 8, 2017.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Piotr Dollar, Ross B Girshick, Kaiming He, Bharath Hariharan, and Serge J Belongie. Feature pyramid networks for object detection. In CVPR, page 4, 2017.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, pages 740–755, 2014.
    Google ScholarLocate open access versionFindings
  • Anan Liu, Yuting Su, Weizhi Nie, and Mohan S Kankanhalli. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans. Pattern Anal. Mach. Intell., 39(1):102–114, 2017.
    Google ScholarLocate open access versionFindings
  • Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In ECCV, pages 21–37, 2016.
    Google ScholarLocate open access versionFindings
  • Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I Jordan. Learning transferable features with deep adaptation networks. arXiv preprint arXiv:1502.02791, 2015.
    Findings
  • Arun Mallya, Dillon Davis, and Svetlana Lazebnik. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In ECCV, pages 67–82, 2018.
    Google ScholarLocate open access versionFindings
  • Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. Cross-stitch networks for multi-task learning. In CVPR, pages 3994–4003, 2016.
    Google ScholarLocate open access versionFindings
  • Andreas Møgelmose, Mohan M Trivedi, and Thomas B Moeslund. Vision-based traffic sign detection and analysis for intelligent driver assistance systems: Perspectives and survey. IEEE Trans. Intelligent Transportation Systems, 13(4):1484–1497, 2012.
    Google ScholarLocate open access versionFindings
  • Mahyar Najibi, Pouya Samangouei, Rama Chellappa, and Larry S Davis. Ssh: Single stage headless face detector. In ICCV, pages 4885–4894, 2017.
    Google ScholarLocate open access versionFindings
  • Hyeonseob Nam and Bohyung Han. Learning multi-domain convolutional neural networks for visual tracking. In CVPR, June 2016.
    Google ScholarLocate open access versionFindings
  • Hyeonseob Nam and Bohyung Han. Learning multi-domain convolutional neural networks for visual tracking. In CVPR, pages 4293–4302, 2016.
    Google ScholarLocate open access versionFindings
  • Stephen E Palmer. Vision science: Photons to phenomenology. MIT press, 1999.
    Google ScholarFindings
  • Vishal M Patel, Raghuraman Gopalan, Ruonan Li, and Rama Chellappa. Visual domain adaptation: A survey of recent advances. IEEE signal processing magazine, 32(3):53–69, 2015.
    Google ScholarLocate open access versionFindings
  • Charles R Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J Guibas. Frustum pointnets for 3d object detection from rgb-d data. 2018.
    Google ScholarFindings
  • Sylvestre-Alvise Rebuffi, Hakan Bilen, and Andrea Vedaldi. Learning multiple visual domains with residual adapters. In NeurIPS, pages 506–516, 2017.
    Google ScholarLocate open access versionFindings
  • Sylvestre-Alvise Rebuffi, Hakan Bilen, and Andrea Vedaldi. Efficient parametrization of multi-domain deep neural networks. In CVPR, pages 8119–8127, 2018.
    Google ScholarLocate open access versionFindings
  • Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In CVPR, pages 779–788, 2016.
    Google ScholarLocate open access versionFindings
  • Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
    Findings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, pages 91–99, 2015.
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, (6):1137–1149, 2017.
    Google ScholarLocate open access versionFindings
  • Amir Rosenfeld and John K Tsotsos. Incremental learning through deep adaptation. IEEE transactions on pattern analysis and machine intelligence, 2018.
    Google ScholarFindings
  • Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In CVPR, page 4, 2017.
    Google ScholarLocate open access versionFindings
  • Parishwad P Vaidyanathan. Multirate systems and filter banks. Pearson Education India, 1993.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
    Google ScholarLocate open access versionFindings
  • Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, and Alan L Yuille. Towards unified depth and semantic prediction from a single image. In CVPR, pages 2800–2809, 2015.
    Google ScholarLocate open access versionFindings
  • Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Jeremy M Wolfe. Visual attention. In Seeing, pages 335– 386.
    Google ScholarLocate open access versionFindings
  • Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liangpei Zhang. Dota: A large-scale dataset for object detection in aerial images. In Proc. CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Ke Yan, Mohammadhadi Bagheri, and Ronald M Summers. 3d context enhanced region-based convolutional neural network for end-to-end lesion detection. In ICCV, pages 511– 519, 2018.
    Google ScholarLocate open access versionFindings
  • Ke Yan, Xiaosong Wang, Le Lu, Ling Zhang, Adam Harrison, Mohammadhadi Bagheri, and Ronald M Summers. Deep lesion graphs in the wild: relationship learning and organization of significant radiology image findings in a diverse large-scale lesion database. In IEEE CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Fan Yang, Wongun Choi, and Yuanqing Lin. Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In CVPR, pages 2129–2137, 2016.
    Google ScholarLocate open access versionFindings
  • Jianwei Yang, Jiasen Lu, Dhruv Batra, and Devi Parikh. A faster pytorch implementation of faster r-cnn. https://github.com/jwyang/faster-rcnn.pytorch, 2017.
    Findings
  • Shuo Yang, Ping Luo, Chen-Change Loy, and Xiaoou Tang. Wider face: A face detection benchmark. In CVPR, pages 5525–5533, 2016.
    Google ScholarLocate open access versionFindings
  • Yongxin Yang and Timothy M Hospedales. A unified perspective on multi-domain and multi-task learning. arXiv preprint arXiv:1412.7489, 2014.
    Findings
  • Steven Yantis. Control of visual attention. attention, 1(1):223–256, 1998.
    Google ScholarLocate open access versionFindings
  • Alfred L Yarbus. Eye movements during perception of complex objects. In Eye movements and vision, pages 171–211.
    Google ScholarLocate open access versionFindings
  • Amir R Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. Taskonomy: Disentangling task transfer learning. In CVPR, pages 3712– 3722, 2018.
    Google ScholarLocate open access versionFindings
  • Yu Zhang and Qiang Yang. An overview of multi-task learning. National Science Review, 5(1):30–43, 2017.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments