Characterizing and Avoiding Negative Transfer
computer vision and pattern recognition, 2019.
EI
Keywords:
unsupervised domain adaptationlabeled datatarget domaingenerative adversarial networksource taskMore(17+)
Weibo:
Abstract:
When labeled data is scarce for a specific target task, transfer learning often offers an effective solution by utilizing data from a related source task. However, when transferring knowledge from a less related source, it may inversely hurt the target performance, a phenomenon known as negative transfer. Despite its pervasiveness, negati...More
Code:
Data:
Introduction
- The development of deep neural networks (DNNs) has improved the state-of-the-art performance on a wide range of machine learning problems and applications.
- The hope is that this source domain is related to the target domain and transferring knowledge from the source can improve the performance within the target domain.
- This powerful paradigm has been studied under various settings [35] and has been proven effective in a wide range of applications [39, 16, 17]
Highlights
- The development of deep neural networks (DNNs) has improved the state-of-the-art performance on a wide range of machine learning problems and applications
- To reveal the three dependent factors, we study the effect of negative transfer under different methods with varying perturbation rates and target labeled data (L%)
- The performance of Domain Adversarial Neural Network under different settings of ǫ and L% on two tasks of Office-31 are shown in Table 1
- Domain Adversarial Neural Network is better than DANNT when ǫ is small, its performance degrades quickly as ǫ increases and drops below DANNT , indicating the occurrence of negative transfer
- We show that negative transfer directly relates to specific algorithms, domain divergence and target data
- Our method consistently improves the performance of base methods and largely avoids negative transfer
Methods
- As discussed in Section 3, the key to achieving successful transfer and avoiding negative effects is to discover and exploit shared underlying structures between PS(X, Y ) and PT (X, Y ).
- The authors can see that while both DANN and DANNT perform better with more labeled target data, DANN is affected by the divergence factor and outperformed by DANNT when L% becomes larger.
- This observation shows that negative transfer is relative and it depends on target labeled data
Results
- Results and Analysis
5.3.1 Study of Negative Transfer
To reveal the three dependent factors, the authors study the effect of negative transfer under different methods with varying perturbation rates and target labeled data (L%).
Divergence factor. - By fixing ǫy = 0 and using two domains W and D that are known to be similar, the authors study negative transfer under the assumption of covariate shift in Table 3, and observe that negative transfer does not occur even with high ǫx and descent L%
- These experimental results confirms that the distribution divergence is an important factor of negative transfer
Conclusion
- The authors analyze the problem of negative transfer and propose a novel discriminator gate technique to avoid it.
- The authors show that negative transfer directly relates to specific algorithms, domain divergence and target data.
- Experiments demonstrate these factors and the efficacy of the method.
- The authors' method consistently improves the performance of base methods and largely avoids negative transfer.
- Understanding negative transfer in more complex transfer tasks and settings should be addressed in a future research
Summary
Introduction:
The development of deep neural networks (DNNs) has improved the state-of-the-art performance on a wide range of machine learning problems and applications.- The hope is that this source domain is related to the target domain and transferring knowledge from the source can improve the performance within the target domain.
- This powerful paradigm has been studied under various settings [35] and has been proven effective in a wide range of applications [39, 16, 17]
Methods:
As discussed in Section 3, the key to achieving successful transfer and avoiding negative effects is to discover and exploit shared underlying structures between PS(X, Y ) and PT (X, Y ).- The authors can see that while both DANN and DANNT perform better with more labeled target data, DANN is affected by the divergence factor and outperformed by DANNT when L% becomes larger.
- This observation shows that negative transfer is relative and it depends on target labeled data
Results:
Results and Analysis
5.3.1 Study of Negative Transfer
To reveal the three dependent factors, the authors study the effect of negative transfer under different methods with varying perturbation rates and target labeled data (L%).
Divergence factor.- By fixing ǫy = 0 and using two domains W and D that are known to be similar, the authors study negative transfer under the assumption of covariate shift in Table 3, and observe that negative transfer does not occur even with high ǫx and descent L%
- These experimental results confirms that the distribution divergence is an important factor of negative transfer
Conclusion:
The authors analyze the problem of negative transfer and propose a novel discriminator gate technique to avoid it.- The authors show that negative transfer directly relates to specific algorithms, domain divergence and target data.
- Experiments demonstrate these factors and the efficacy of the method.
- The authors' method consistently improves the performance of base methods and largely avoids negative transfer.
- Understanding negative transfer in more complex transfer tasks and settings should be addressed in a future research
Tables
- Table1: Classification accuracy (%) of DANN and DANNgate on tasks W→D and A→D. Perturbation rates are set equal, i.e. ǫ = ǫx = ǫy. N T G1 and N T G2 are negative transfer gaps for DANN and DANNgate. ∆ is the performance gain of DANNgate compared to DANN
- Table2: Classification accuracy (%) of state-of-the-art methods on four benchmark datasets with negative transfer gap shown in brackets. Perturbation rates are fixed at ǫx = ǫy = 0.7. Target labeled ratio is set at L% = 10% and we further enforce each task to use at most 3 labeled target samples per class
- Table3: Classification accuracy (%) under the Covariate Shift assumption on task W→D. ǫy is fixed at 0. Negative transfer gap is shown in brackets
- Table4: Ablation Study on task A→D. DANNgate-only applies only the discriminator gate while DANNlabel-only only uses label information without the gate. DANNjoint is a variant of DANNgate where the feature network only matches the joint distribution (last two lines of Eq.10), DANNmarginal only matches the marginal distribution, and DANNnone matches none of them
Related work
- Transfer learning [20, 36] uses knowledge learned in the source domain to assist training in the target domain. Early methods exploit conventional statistical techniques such as instance weighting [14] and feature mapping [19, 32]. Compared to these earlier approaches, deep transfer networks achieve better results in discovering domain invariant factors [37]. Some deep methods [16, 27] transfer via distribution (mis)match measurements such as Maximum Mean Discrepancy (MMD) [14]. More recent work[9, 29, 3, 26] exploit generative adversarial networks (GANs) [12] and add a subnetwork as a domain discriminator. These methods achieve state-of-the-art on computer vision tasks [26] and some natural language processing tasks [17]. However, none of these techniques are specifically designed to tackle the problem of negative transfer.
Reference
- A. Azulay and Y. Weiss. Why do deep convolutional networks generalize so poorly to small image transformations? arXiv preprint arXiv:1805.12177, 2018.
- S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems (NIPS), pages 137– 144, 2007.
- Z. Cao, M. Long, J. Wang, and M. I. Jordan. Partial transfer learning with selective adversarial networks. 2018.
- Z. Cao, L. Ma, M. Long, and J. Wang. Partial adversarial domain adaptation. 2018.
- C. Cortes, Y. Mansour, and M. Mohri. Learning bounds for importance weighting. In Advances in neural information processing systems, pages 442–450, 2010.
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In International conference on machine learning, pages 647–655, 2014.
- L. Duan, D. Xu, and S.-F. Chang. Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1338–1345. IEEE, 2012.
- Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning, pages 1180–1189, 2015.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domainadversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
- L. Ge, J. Gao, H. Ngo, K. Li, and A. Zhang. On handling negative transfer and imbalanced distributions in multiple source transfer learning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(4):254–271, 2014.
- M. Gong, K. Zhang, T. Liu, D. Tao, C. Glymour, and B. Scholkopf. Domain adaptation with conditional transferable components. In International conference on machine learning, pages 2839–2848, 2016.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- J. Huang, A. Gretton, K. M. Borgwardt, B. Scholkopf, and A. J. Smola. Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems (NIPS), pages 601–608, 2007.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- M. Long, Y. Cao, J. Wang, and M. I. Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015.
- S. Moon and J. Carbonell. Completely heterogeneous transfer learning with attention-what and what not to transfer. In International Joint Conference on Artificial Intelligence (IJCAI), pages 2508–2514, 2017.
- Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with unsupervised feature learning. volume 2011, page 5, 2011.
- S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2):199–210, 2011.
- S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010.
- Z. Pei, Z. Cao, M. Long, and J. Wang. Multi-adversarial domain adaptation. In AAAI Conference on Artificial Intelligence, 2018.
- X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko. Visda: The visual domain adaptation challenge, 2017.
- E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke. Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 7464–7473. IEEE, 2017.
- M. T. Rosenstein, Z. Marx, L. P. Kaelbling, and T. G. Dietterich. To transfer or not to transfer. In NIPS 2005 workshop on transfer learning, volume 898, pages 1–4, 2005.
- K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In European conference on computer vision, pages 213–226.
- S. Sankaranarayanan, Y. Balaji, C. D. Castillo, and R. Chellappa. Generate to adapt: Aligning domains using generative adversarial networks. Computer Vision and Pattern Recognition (CVPR), 2018.
- B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. pages 443–450.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. ICLR, 2014.
- E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko. Simultaneous deep transfer across domains and tasks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 4068–4076, 2015.
- E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition (CVPR), volume 1, page 4, 2017.
- M. Uehara, I. Sato, M. Suzuki, K. Nakayama, and Y. Matsuo. Generative adversarial nets from a density ratio estimation perspective. arXiv preprint arXiv:1610.02920, 2016.
- S. Uguroglu and J. Carbonell. Feature selection for transfer learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 430–442.
- H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan. Deep hashing network for unsupervised domain adaptation. In (IEEE) Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Z. Wang and J. Carbonell. Towards more reliable transfer learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 794–810, 2018.
- K. Weiss, T. M. Khoshgoftaar, and D. Wang. A survey of transfer learning. Journal of Big Data, 3(1):9, 2016.
- L. Yang, S. Hanneke, and J. Carbonell. A theory of transfer learning with applications to active learning. Machine learning, 90(2):161–189, 2013.
- J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Advances in neural information processing systems, pages 3320–3328, 2014.
- Y.-L. Yu and C. Szepesvari. Analysis of kernel mean matching under covariate shift. In Proceedings of the 29th International Conference on Machine Learning, pages 1147–1154, 2012.
- A. R. Zamir, A. Sax, W. Shen, L. Guibas, J. Malik, and S. Savarese. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018.
Tags
Comments