Unsupervised Model Personalization While Preserving Privacy and Scalability: An Open Problem

CVPR, pp. 14451-14460, 2020.

Cited by: 1|Bibtex|Views77|Links
EI
Keywords:
Incremental Moment Matchingdeep neural networkuser deviceindoor scenecontinual learningMore(25+)
Weibo:
Local user-adaptation with a data regularization approach based on adaptive Batch Normalization, and especially its supervised variant, seem more promising, leading to systematic improvements when taking advantage of labeled user-specific data

Abstract:

This work investigates the task of unsupervised model personalization, adapted to continually evolving, unlabeled local user images. We consider the practical scenario where a high capacity server interacts with a myriad of resource-limited edge devices, imposing strong requirements on scalability and local data privacy. We aim to addre...More

Code:

Data:

0
Introduction
  • High performing deep neural network models lead to considerable data requirements, with high capacity models trained on large amounts of labeled data.
  • User data cannot be shared directly due to rigorous privacy constraints.
  • This motivates the need to separate supervised model training on the server from local adaptation to a user’s unlabeled personal data.
  • The personalized user model performing tasks locally has the additional benefit of alleviated connectivity requirements
Highlights
  • Data availability and increased hardware efficiency have made neural networks thrive in a wide range of tasks, competing human-level performance in a variety of tasks [13].

    high performing deep neural network models lead to considerable data requirements, with high capacity models trained on large amounts of labeled data
  • High performing deep neural network models lead to considerable data requirements, with high capacity models trained on large amounts of labeled data
  • We proposed a practical Dual UserAdaptation framework (DUA) to tackle incremental domain adaptation to real-life scenarios with numerous users
  • This novel user-adaptation paradigm disentangles personalization to both the server and local user device, and combines desirable user privacy and scalability properties, which remain highly unexplored in literature
  • Local user-adaptation with a data regularization approach based on adaptive Batch Normalization (AdaBN), and especially its supervised variant (AdaBN-S), seem more promising, leading to systematic improvements when taking advantage of labeled user-specific data
  • User privacy and experience are of major concern, for which our Dual User-Adaptation framework framework forges a principled foundation for dual user-adaptation, aspiring to promote further research in this direction
Methods
  • MAS-IMM FIM-IMM MAS EWC LWF Joint.
  • + AdaBN + AdaBN-S Adapt.
  • (ψ) (ψ) (φ) (φ) Unsup.
  • N N N ·L NNNNNN Priv.
Results
  • The authors report average accuracy and forgetting on the final model after training all tasks.
  • This final post-merging model is either user-specific or the general server model.
  • Results are averaged over all users.
  • Methods can be subdivided into user-specific and useragnostic approaches.
  • Table 1 summarizes all method features in the user-adaptive setting.
  • User-Specific methods adapt to the local user-validation set of the user, resulting in a personalized model
Conclusion
  • The authors proposed a practical Dual UserAdaptation framework (DUA) to tackle incremental domain adaptation to real-life scenarios with numerous users.
  • This novel user-adaptation paradigm disentangles personalization to both the server and local user device, and combines desirable user privacy and scalability properties, which remain highly unexplored in literature.
  • Adapting models on the server following RACL incurs these scalability, privacy, and additional supervision properties, yet in practice yielded only marginal improvement over a user-agnostic model, due to gradient-based importance weights being largely data independent.
Summary
  • Introduction:

    High performing deep neural network models lead to considerable data requirements, with high capacity models trained on large amounts of labeled data.
  • User data cannot be shared directly due to rigorous privacy constraints.
  • This motivates the need to separate supervised model training on the server from local adaptation to a user’s unlabeled personal data.
  • The personalized user model performing tasks locally has the additional benefit of alleviated connectivity requirements
  • Methods:

    MAS-IMM FIM-IMM MAS EWC LWF Joint.
  • + AdaBN + AdaBN-S Adapt.
  • (ψ) (ψ) (φ) (φ) Unsup.
  • N N N ·L NNNNNN Priv.
  • Results:

    The authors report average accuracy and forgetting on the final model after training all tasks.
  • This final post-merging model is either user-specific or the general server model.
  • Results are averaged over all users.
  • Methods can be subdivided into user-specific and useragnostic approaches.
  • Table 1 summarizes all method features in the user-adaptive setting.
  • User-Specific methods adapt to the local user-validation set of the user, resulting in a personalized model
  • Conclusion:

    The authors proposed a practical Dual UserAdaptation framework (DUA) to tackle incremental domain adaptation to real-life scenarios with numerous users.
  • This novel user-adaptation paradigm disentangles personalization to both the server and local user device, and combines desirable user privacy and scalability properties, which remain highly unexplored in literature.
  • Adapting models on the server following RACL incurs these scalability, privacy, and additional supervision properties, yet in practice yielded only marginal improvement over a user-agnostic model, due to gradient-based importance weights being largely data independent.
Tables
  • Table1: Qualitatively comparing features: user-adaptive (Adapt.), unsupervised (Unsup.), scalable (Scal.) and privacy-preserving (Priv.). DUA subdivides adaptation on the server (ψ) and local user device (φ), with MAS importance weights discarding supervision. Scalability for useradaptive methods implies training independent of the number of users L. Shared user-data can be raw dl, gradients of the output function F (x; θ) or loss L(x, y; θ). All methods can be extended with unsupervised (AdaBN) and supervised (AdaBN-S) local user adaptation φ
  • Table2: Reporting average accuracy (forgetting) for IMM mode-merging with both unsupervised (MAS) and supervised (FIM) importance weights
  • Table3: Left: Average accuracy (forgetting) for the three data setups and models, comparing user-specific (RACL) and user-agnostic (IMM) importance weights, both unsupervised (MAS-) and supervised (FIM-). RACL outperforming the corresponding IMM variant is indicated in bold. Right: Qualitatively comparing features user-adaptive (Adapt.), unsupervised (Unsup.), scalable (Scal.) and privacy-preserving (Priv.)
  • Table4: Results in the CatPrior and TransPrior setups with model VGG11-BN, comparing batch normalization on the server data (BN) with unsupervised (AdaBN) and supervised (AdaBN-S) user-adaptive variants
Download tables as Excel
Related work
  • The DUA framework introduces a new paradigm for user adaptation on the server, resembling federated learning [23], although completely overturning the purpose. Federated learning updates a common server model with an aggregated gradient from a distributed database, wherein each user constitutes a node providing local gradients. Similarly, DUA solely uses user-specific gradients to acquire better models, but attains decentralized user-personalized models, instead of a general trend-following model. Our framework invigorates profound overall user privacy, ensuring no sensible raw user data has to be shared, and additionally tackles the challenging issue of scalability for millions of personalized neural networks.

    Further, sequentially learning multiple tasks by finetuning a neural network results in significant loss of previously acquired knowledge. Literature on continual learning largely addresses coping with this catastrophic forgetting [5, 25]. Nonetheless, recent works mainly focus on supervised data, leaving the richness of available unsupervised user data unused. Following [5], these methods can be subdivided into three main categories. First, parameter-isolation methods preserve task knowledge by obtaining task-specific masks [22, 21, 31], or dynamically extending the architecture [30]. Replay methods preserve a subset of representative samples of the previous tasks, replayed during training of new tasks. These exemplars can be raw images [20, 2, 29], or virtual samples retrieved from task-specific generative models [32]. Rao et al [28] extend virtual replay to a completely unsupervised setting based on variational autoencoders. However, this would require exhaustive training on the low capacity edge-device of the user, with only a limited set of available user data, hence infeasible for user personalization. Finally, regularization-based methods impose a prior in the loss function when training the new task. Learning without forgetting (LwF) [17] minimizes a KL divergence prior to remain close to the new sample’s output on the previous task model, hence distilling previous task knowledge [8]. Further work [27] extends this idea with task-specific autoencoders, additionally penalizing new task features to drift away from features deemed important for previous tasks. Elastic Weight Consolidation (EWC) [11] introduces a prior on previous-task parameters in a sequential Bayesian framework, Laplace approximated by a Gaussian with diagonally assumed Fisher information matrix (FIM) as precision. As the FIM is estimated in the task optimum, Zenke et al [35] propose an online approach to estimate precision during training instead. Furthermore, the FIM relies on the loss gradient ∇L, whereas MAS [1] sidesteps this supervised loss dependency by relying on the output gradient ∇F instead. IMM [15] differs from previously discussed methods in first preserving trained task models, which are subsequently merged using FIM importance weights or by averaging.
Funding
  • The authors would like to thank Huawei for funding this research as part of the HIRP Open project
Reference
  • Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision (ECCV), pages 139–154, 2018. 2, 3, 4, 6
    Google ScholarLocate open access versionFindings
  • Arslan Chaudhry, MarcAurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with AGEM. In ICLR, 2019. 2
    Google ScholarLocate open access versionFindings
  • Brian Cheung, Alex Terekhov, Yubei Chen, Pulkit Agrawal, and Bruno Olshausen. Superposition of many models into one. arXiv preprint arXiv:1902.05522, 2019. 3
    Findings
  • Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, and Chu-Song Chen. Unifying and merging well-trained deep neural networks for inference stage. arXiv preprint arXiv:1805.04980, 2018. 3
    Findings
  • Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. Continual learning: A comparative study on how to defy forgetting in classification tasks. arXiv preprint arXiv:1909.08383, 2019. 2, 8
    Findings
  • Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. arXiv preprint arXiv:1409.7495, 2014. 3
    Findings
  • Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019. 5
    Findings
  • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. 3
    Findings
  • Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A Efros, and Trevor Darrell. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017. 3
    Findings
  • Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015. 4
    Findings
  • James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka GrabskaBarwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017. 3, 4, 6
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 205
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015. 1
    Google ScholarLocate open access versionFindings
  • Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010. 5
    Google ScholarFindings
  • Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. Overcoming catastrophic forgetting by incremental moment matching. In Advances in neural information processing systems, pages 4652–4662, 2017. 2, 3, 4, 6
    Google ScholarLocate open access versionFindings
  • Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, and Xiaodi Hou. Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv:1603.04779, 202, 5
    Findings
  • Zhizhong Li and Derek Hoiem. Learning without forgetting. In ECCV, pages 614–629. Springer, 2016. 3
    Google ScholarLocate open access versionFindings
  • Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017. 6
    Google ScholarLocate open access versionFindings
  • Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, pages 136–144, 2016. 3
    Google ScholarLocate open access versionFindings
  • David Lopez-Paz et al. Gradient episodic memory for continual learning. In NeurIPS, pages 6470–6479, 2017. 2
    Google ScholarLocate open access versionFindings
  • Arun Mallya, Dillon Davis, and Svetlana Lazebnik. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In ECCV, pages 67–82, 2018. 2
    Google ScholarLocate open access versionFindings
  • Arun Mallya and Svetlana Lazebnik. Packnet: Adding multiple tasks to a single network by iterative pruning. In CVPR, pages 7765–7773, 2018. 2
    Google ScholarLocate open access versionFindings
  • H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, et al. Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629, 2016. 2
    Findings
  • Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011. 5
    Google ScholarFindings
  • German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 2019. 2, 8
    Google ScholarLocate open access versionFindings
  • Ariadna Quattoni and Antonio Torralba. Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 413–420. IEEE, 2009. 5
    Google ScholarLocate open access versionFindings
  • Amal Rannen, Rahaf Aljundi, Matthew B Blaschko, and Tinne Tuytelaars. Encoder based lifelong learning. In ICCV, pages 1320–1328, 2017. 3
    Google ScholarLocate open access versionFindings
  • Dushyant Rao, Francesco Visin, Andrei Rusu, Razvan Pascanu, Yee Whye Teh, and Raia Hadsell. Continual unsupervised representation learning. In Advances in Neural Information Processing Systems, pages 7645–7655, 2019. 2
    Google ScholarLocate open access versionFindings
  • Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In CVPR, pages 2001–2010, 2017. 2
    Google ScholarLocate open access versionFindings
  • Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016. 2
    Findings
  • Joan Serra, Dıdac Surıs, Marius Miron, and Alexandros Karatzoglou. Overcoming catastrophic forgetting with hard attention to the task. arXiv preprint arXiv:1801.01423, 2018. 2
    Findings
  • Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay. In NeurIPS, pages 2990–2999, 2017. 2
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 5
    Findings
  • Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7167–7176, 2017. 3
    Google ScholarLocate open access versionFindings
  • Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 3987–3995. JMLR. org, 2017. 3
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments