Characterizing and Avoiding Negative Transfer

computer vision and pattern recognition, 2019.

Cited by: 43|Bibtex|Views176|Links
EI
Keywords:
unsupervised domain adaptationlabeled datatarget domaingenerative adversarial networksource taskMore(17+)
Weibo:
To reveal the three dependent factors, we study the effect of negative transfer under different methods with varying perturbation rates and target labeled data

Abstract:

When labeled data is scarce for a specific target task, transfer learning often offers an effective solution by utilizing data from a related source task. However, when transferring knowledge from a less related source, it may inversely hurt the target performance, a phenomenon known as negative transfer. Despite its pervasiveness, negati...More

Code:

Data:

0
Introduction
  • The development of deep neural networks (DNNs) has improved the state-of-the-art performance on a wide range of machine learning problems and applications.
  • The hope is that this source domain is related to the target domain and transferring knowledge from the source can improve the performance within the target domain.
  • This powerful paradigm has been studied under various settings [35] and has been proven effective in a wide range of applications [39, 16, 17]
Highlights
  • The development of deep neural networks (DNNs) has improved the state-of-the-art performance on a wide range of machine learning problems and applications
  • To reveal the three dependent factors, we study the effect of negative transfer under different methods with varying perturbation rates and target labeled data (L%)
  • The performance of Domain Adversarial Neural Network under different settings of ǫ and L% on two tasks of Office-31 are shown in Table 1
  • Domain Adversarial Neural Network is better than DANNT when ǫ is small, its performance degrades quickly as ǫ increases and drops below DANNT , indicating the occurrence of negative transfer
  • We show that negative transfer directly relates to specific algorithms, domain divergence and target data
  • Our method consistently improves the performance of base methods and largely avoids negative transfer
Methods
  • As discussed in Section 3, the key to achieving successful transfer and avoiding negative effects is to discover and exploit shared underlying structures between PS(X, Y ) and PT (X, Y ).
  • The authors can see that while both DANN and DANNT perform better with more labeled target data, DANN is affected by the divergence factor and outperformed by DANNT when L% becomes larger.
  • This observation shows that negative transfer is relative and it depends on target labeled data
Results
  • Results and Analysis

    5.3.1 Study of Negative Transfer

    To reveal the three dependent factors, the authors study the effect of negative transfer under different methods with varying perturbation rates and target labeled data (L%).

    Divergence factor.
  • By fixing ǫy = 0 and using two domains W and D that are known to be similar, the authors study negative transfer under the assumption of covariate shift in Table 3, and observe that negative transfer does not occur even with high ǫx and descent L%
  • These experimental results confirms that the distribution divergence is an important factor of negative transfer
Conclusion
  • The authors analyze the problem of negative transfer and propose a novel discriminator gate technique to avoid it.
  • The authors show that negative transfer directly relates to specific algorithms, domain divergence and target data.
  • Experiments demonstrate these factors and the efficacy of the method.
  • The authors' method consistently improves the performance of base methods and largely avoids negative transfer.
  • Understanding negative transfer in more complex transfer tasks and settings should be addressed in a future research
Summary
  • Introduction:

    The development of deep neural networks (DNNs) has improved the state-of-the-art performance on a wide range of machine learning problems and applications.
  • The hope is that this source domain is related to the target domain and transferring knowledge from the source can improve the performance within the target domain.
  • This powerful paradigm has been studied under various settings [35] and has been proven effective in a wide range of applications [39, 16, 17]
  • Methods:

    As discussed in Section 3, the key to achieving successful transfer and avoiding negative effects is to discover and exploit shared underlying structures between PS(X, Y ) and PT (X, Y ).
  • The authors can see that while both DANN and DANNT perform better with more labeled target data, DANN is affected by the divergence factor and outperformed by DANNT when L% becomes larger.
  • This observation shows that negative transfer is relative and it depends on target labeled data
  • Results:

    Results and Analysis

    5.3.1 Study of Negative Transfer

    To reveal the three dependent factors, the authors study the effect of negative transfer under different methods with varying perturbation rates and target labeled data (L%).

    Divergence factor.
  • By fixing ǫy = 0 and using two domains W and D that are known to be similar, the authors study negative transfer under the assumption of covariate shift in Table 3, and observe that negative transfer does not occur even with high ǫx and descent L%
  • These experimental results confirms that the distribution divergence is an important factor of negative transfer
  • Conclusion:

    The authors analyze the problem of negative transfer and propose a novel discriminator gate technique to avoid it.
  • The authors show that negative transfer directly relates to specific algorithms, domain divergence and target data.
  • Experiments demonstrate these factors and the efficacy of the method.
  • The authors' method consistently improves the performance of base methods and largely avoids negative transfer.
  • Understanding negative transfer in more complex transfer tasks and settings should be addressed in a future research
Tables
  • Table1: Classification accuracy (%) of DANN and DANNgate on tasks W→D and A→D. Perturbation rates are set equal, i.e. ǫ = ǫx = ǫy. N T G1 and N T G2 are negative transfer gaps for DANN and DANNgate. ∆ is the performance gain of DANNgate compared to DANN
  • Table2: Classification accuracy (%) of state-of-the-art methods on four benchmark datasets with negative transfer gap shown in brackets. Perturbation rates are fixed at ǫx = ǫy = 0.7. Target labeled ratio is set at L% = 10% and we further enforce each task to use at most 3 labeled target samples per class
  • Table3: Classification accuracy (%) under the Covariate Shift assumption on task W→D. ǫy is fixed at 0. Negative transfer gap is shown in brackets
  • Table4: Ablation Study on task A→D. DANNgate-only applies only the discriminator gate while DANNlabel-only only uses label information without the gate. DANNjoint is a variant of DANNgate where the feature network only matches the joint distribution (last two lines of Eq.10), DANNmarginal only matches the marginal distribution, and DANNnone matches none of them
Download tables as Excel
Related work
  • Transfer learning [20, 36] uses knowledge learned in the source domain to assist training in the target domain. Early methods exploit conventional statistical techniques such as instance weighting [14] and feature mapping [19, 32]. Compared to these earlier approaches, deep transfer networks achieve better results in discovering domain invariant factors [37]. Some deep methods [16, 27] transfer via distribution (mis)match measurements such as Maximum Mean Discrepancy (MMD) [14]. More recent work[9, 29, 3, 26] exploit generative adversarial networks (GANs) [12] and add a subnetwork as a domain discriminator. These methods achieve state-of-the-art on computer vision tasks [26] and some natural language processing tasks [17]. However, none of these techniques are specifically designed to tackle the problem of negative transfer.
Reference
  • A. Azulay and Y. Weiss. Why do deep convolutional networks generalize so poorly to small image transformations? arXiv preprint arXiv:1805.12177, 2018.
    Findings
  • S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems (NIPS), pages 137– 144, 2007.
    Google ScholarLocate open access versionFindings
  • Z. Cao, M. Long, J. Wang, and M. I. Jordan. Partial transfer learning with selective adversarial networks. 2018.
    Google ScholarFindings
  • Z. Cao, L. Ma, M. Long, and J. Wang. Partial adversarial domain adaptation. 2018.
    Google ScholarFindings
  • C. Cortes, Y. Mansour, and M. Mohri. Learning bounds for importance weighting. In Advances in neural information processing systems, pages 442–450, 2010.
    Google ScholarLocate open access versionFindings
  • J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In International conference on machine learning, pages 647–655, 2014.
    Google ScholarLocate open access versionFindings
  • L. Duan, D. Xu, and S.-F. Chang. Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1338–1345. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning, pages 1180–1189, 2015.
    Google ScholarLocate open access versionFindings
  • Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domainadversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
    Google ScholarLocate open access versionFindings
  • L. Ge, J. Gao, H. Ngo, K. Li, and A. Zhang. On handling negative transfer and imbalanced distributions in multiple source transfer learning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(4):254–271, 2014.
    Google ScholarLocate open access versionFindings
  • M. Gong, K. Zhang, T. Liu, D. Tao, C. Glymour, and B. Scholkopf. Domain adaptation with conditional transferable components. In International conference on machine learning, pages 2839–2848, 2016.
    Google ScholarLocate open access versionFindings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • J. Huang, A. Gretton, K. M. Borgwardt, B. Scholkopf, and A. J. Smola. Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems (NIPS), pages 601–608, 2007.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • M. Long, Y. Cao, J. Wang, and M. I. Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015.
    Google ScholarLocate open access versionFindings
  • S. Moon and J. Carbonell. Completely heterogeneous transfer learning with attention-what and what not to transfer. In International Joint Conference on Artificial Intelligence (IJCAI), pages 2508–2514, 2017.
    Google ScholarLocate open access versionFindings
  • Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with unsupervised feature learning. volume 2011, page 5, 2011.
    Google ScholarFindings
  • S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2):199–210, 2011.
    Google ScholarLocate open access versionFindings
  • S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010.
    Google ScholarLocate open access versionFindings
  • Z. Pei, Z. Cao, M. Long, and J. Wang. Multi-adversarial domain adaptation. In AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko. Visda: The visual domain adaptation challenge, 2017.
    Google ScholarFindings
  • E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke. Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 7464–7473. IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • M. T. Rosenstein, Z. Marx, L. P. Kaelbling, and T. G. Dietterich. To transfer or not to transfer. In NIPS 2005 workshop on transfer learning, volume 898, pages 1–4, 2005.
    Google ScholarLocate open access versionFindings
  • K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In European conference on computer vision, pages 213–226.
    Google ScholarLocate open access versionFindings
  • S. Sankaranarayanan, Y. Balaji, C. D. Castillo, and R. Chellappa. Generate to adapt: Aligning domains using generative adversarial networks. Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. pages 443–450.
    Google ScholarFindings
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. ICLR, 2014.
    Google ScholarFindings
  • E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko. Simultaneous deep transfer across domains and tasks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 4068–4076, 2015.
    Google ScholarLocate open access versionFindings
  • E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition (CVPR), volume 1, page 4, 2017.
    Google ScholarLocate open access versionFindings
  • M. Uehara, I. Sato, M. Suzuki, K. Nakayama, and Y. Matsuo. Generative adversarial nets from a density ratio estimation perspective. arXiv preprint arXiv:1610.02920, 2016.
    Findings
  • S. Uguroglu and J. Carbonell. Feature selection for transfer learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 430–442.
    Google ScholarLocate open access versionFindings
  • H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan. Deep hashing network for unsupervised domain adaptation. In (IEEE) Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • Z. Wang and J. Carbonell. Towards more reliable transfer learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 794–810, 2018.
    Google ScholarLocate open access versionFindings
  • K. Weiss, T. M. Khoshgoftaar, and D. Wang. A survey of transfer learning. Journal of Big Data, 3(1):9, 2016.
    Google ScholarLocate open access versionFindings
  • L. Yang, S. Hanneke, and J. Carbonell. A theory of transfer learning with applications to active learning. Machine learning, 90(2):161–189, 2013.
    Google ScholarLocate open access versionFindings
  • J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Advances in neural information processing systems, pages 3320–3328, 2014.
    Google ScholarLocate open access versionFindings
  • Y.-L. Yu and C. Szepesvari. Analysis of kernel mean matching under covariate shift. In Proceedings of the 29th International Conference on Machine Learning, pages 1147–1154, 2012.
    Google ScholarLocate open access versionFindings
  • A. R. Zamir, A. Sax, W. Shen, L. Guibas, J. Malik, and S. Savarese. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments