RandAugment: Practical Automated Data Augmentation with a Reduced Search Space

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views46|Links
Keywords:
search phaseaugmentation strategylarge scaleaugmentation policydata augmentationMore(9+)
Weibo:
Learning data augmentation strategies from data has recently emerged as a new paradigm to automate the design of augmentation and has the potential to address some weaknesses of traditional data augmentation methods

Abstract:

Recent work on automated data augmentation strategies has led to state-of-theart results in image classification and object detection. An obstacle to a largescale adoption of these methods is that they require a separate and expensive search phase. A common way to overcome the expense of the search phase was to use a smaller proxy task. H...More

Code:

Data:

0
Introduction
  • Data augmentation is a widely used method to add additional knowledge when training vision models [30, 15, 5, 42], the fact that it is manually designed makes it difficult to scale to new applications.
  • The proxy task helps speeding up the search process, it adds extra complexity to the methods and causes further issues.
Highlights
  • Data augmentation is a widely used method to add additional knowledge when training vision models [30, 15, 5, 42], the fact that it is manually designed makes it difficult to scale to new applications
  • Learning data augmentation strategies from data has recently emerged as a new paradigm to automate the design of augmentation and has the potential to address some weaknesses of traditional data augmentation methods [3, 45, 13, 17]
  • While previous work focused on the search methodology [17, 13], our analysis shows that the search space plays a more significant role
  • Our goal is to demonstrate the relative benefits of employing this method over previous learned augmentation methods; the RandAugment model and the baseline model do not differ in any setting other than the data augmentation strategy
  • Data augmentation is a necessary method for achieving state-of-the-art performance [30, 15, 5, 42, 9, 26]
  • On EfficientNet-B7, the resulting model achieves 84.7% exhibiting a 1.0% improvement over the baseline augmentation
  • Learned data augmentation strategies have helped automate the design of such strategies and likewise achieved state-of-the-art results [3, 17, 13, 45]
Methods
  • To explore the space of data augmentations, the authors experiment with core image classification and object detection tasks.
  • CIFAR-10 has been extensively studied with previous data augmentation methods, so the authors first test this proposed method on it.
  • Results indicate that despite its simplicity, RandAugment achieves competitive results on CIFAR-10 across four network architectures (Table 2).
  • As a more challenging task, the authors compare the efficacy of RandAugment on CIFAR-100 for Wide-ResNet-28-2 and Wide-ResNet-28-10.
  • For Wide-ResNet-28-2 and Wide-ResNet-28-10, the authors find that N =1, M =2 and N =2, M =14 achieves best results, respectively.
  • RandAugment achieves competitive or superior results compared to AutoAugment across both architectures (Table 2)
Results
  • The authors' method achieves state-ofthe-art performance on CIFAR-10, SVHN, and ImageNet. On EfficientNet-B7, the authors achieve 84.7% accuracy, a 1.0% increase over baseline augmentation and a 0.4% improvement over AutoAugment on the ImageNet dataset.
  • The same method used for classification leads to 1.0-1.3% improvement over the baseline augmentation method on COCO.
  • For Wide-ResNet-28-2, applying RandAugment to the core training dataset improves performance more than augmenting with 531K additional training images (98.3% vs 98.2%).
  • Even with only two transformations, RandAugment leads to more than 1% improvement in validation accuracy on average
Conclusion
  • Data augmentation is a necessary method for achieving state-of-the-art performance [30, 15, 5, 42, 9, 26].
  • Not tailoring the number of distortions and the distortion magnitude to the dataset size nor the model size leads to sub-optimal performance
  • To remedy this situation, the authors propose a simple parameterization for targeting augmentation to particular model and dataset sizes.
  • The authors demonstrate that RandAugment is competitive with or outperforms previous approaches [3, 17, 13, 45] on CIFAR-10/100, SVHN, ImageNet and COCO without a separate search for data augmentation policies
Summary
  • Introduction:

    Data augmentation is a widely used method to add additional knowledge when training vision models [30, 15, 5, 42], the fact that it is manually designed makes it difficult to scale to new applications.
  • The proxy task helps speeding up the search process, it adds extra complexity to the methods and causes further issues.
  • Objectives:

    The authors aim to make AutoAugment and related methods [3, 13, 17] more practical. The authors' goal is to demonstrate the relative benefits of employing this method over previous learned augmentation methods; the RandAugment model and the baseline model do not differ in any setting other than the data augmentation strategy.
  • Methods:

    To explore the space of data augmentations, the authors experiment with core image classification and object detection tasks.
  • CIFAR-10 has been extensively studied with previous data augmentation methods, so the authors first test this proposed method on it.
  • Results indicate that despite its simplicity, RandAugment achieves competitive results on CIFAR-10 across four network architectures (Table 2).
  • As a more challenging task, the authors compare the efficacy of RandAugment on CIFAR-100 for Wide-ResNet-28-2 and Wide-ResNet-28-10.
  • For Wide-ResNet-28-2 and Wide-ResNet-28-10, the authors find that N =1, M =2 and N =2, M =14 achieves best results, respectively.
  • RandAugment achieves competitive or superior results compared to AutoAugment across both architectures (Table 2)
  • Results:

    The authors' method achieves state-ofthe-art performance on CIFAR-10, SVHN, and ImageNet. On EfficientNet-B7, the authors achieve 84.7% accuracy, a 1.0% increase over baseline augmentation and a 0.4% improvement over AutoAugment on the ImageNet dataset.
  • The same method used for classification leads to 1.0-1.3% improvement over the baseline augmentation method on COCO.
  • For Wide-ResNet-28-2, applying RandAugment to the core training dataset improves performance more than augmenting with 531K additional training images (98.3% vs 98.2%).
  • Even with only two transformations, RandAugment leads to more than 1% improvement in validation accuracy on average
  • Conclusion:

    Data augmentation is a necessary method for achieving state-of-the-art performance [30, 15, 5, 42, 9, 26].
  • Not tailoring the number of distortions and the distortion magnitude to the dataset size nor the model size leads to sub-optimal performance
  • To remedy this situation, the authors propose a simple parameterization for targeting augmentation to particular model and dataset sizes.
  • The authors demonstrate that RandAugment is competitive with or outperforms previous approaches [3, 17, 13, 45] on CIFAR-10/100, SVHN, ImageNet and COCO without a separate search for data augmentation policies
Tables
  • Table1: Simple grid search on a vastly reduced search space matches or exceeds predictive performance of other augmentation methods. We report the search space size, and the test accuracy achieved for AutoAugment (AA) [<a class="ref-link" id="c3" href="#r3">3</a>], Fast AutoAugment [<a class="ref-link" id="c17" href="#r17">17</a>], Population Based Augmentation (PBA) [<a class="ref-link" id="c13" href="#r13">13</a>], Adversarial AutoAugment [<a class="ref-link" id="c43" href="#r43">43</a>] and the proposed RandAugment (RA) on CIFAR-10 [<a class="ref-link" id="c14" href="#r14">14</a>], SVHN [<a class="ref-link" id="c24" href="#r24">24</a>], and ImageNet [<a class="ref-link" id="c4" href="#r4">4</a>] classification tasks. Search space size is reported as the order of magnitude of the number of possible augmentation policies. Dash indicates that results are not available
  • Table2: Test accuracy (%) on CIFAR-10, CIFAR-100, SVHN and SVHN core set. Comparisons across default data augmentation (baseline), Population Based Augmentation (PBA) [<a class="ref-link" id="c13" href="#r13">13</a>], Fast AutoAugment (Fast AA) [<a class="ref-link" id="c17" href="#r17">17</a>], Online Hyper-parameter Learning for Auto-Augmentation Strategy (OHL AA) [<a class="ref-link" id="c18" href="#r18">18</a>], Adversarial AutoAugment (Adv AA) [<a class="ref-link" id="c43" href="#r43">43</a>], AutoAugment (AA) [<a class="ref-link" id="c3" href="#r3">3</a>] and proposed RandAugment (RA). Note that baseline and AA are replicated in this work. SVHN core set consists of 73K examples. The Shake-Shake model [<a class="ref-link" id="c8" href="#r8">8</a>] employed a 26 2⇥96d configuration, and the PyramidNet model used the ShakeDrop regularization [<a class="ref-link" id="c38" href="#r38">38</a>]. Results reported by us are averaged over 10 independent runs. Bold indicates best results
  • Table3: ImageNet results. Top-1 accuracy (%) on ImageNet. Baseline and AutoAugment (AA) results on ResNet-50 are from [<a class="ref-link" id="c3" href="#r3">3</a>]. Fast AutoAugment (Fast AA) results are from [<a class="ref-link" id="c17" href="#r17">17</a>]. Note that the ResNet-50 results for all augmentation methods used the baseline model with the same performance, which we reproduced in this paper. EfficientNet results with and without AutoAugment are from [<a class="ref-link" id="c34" href="#r34">34</a>]. Highest accuracy for each model is presented in bold
  • Table4: Results on object detection. Mean average precision (mAP) on COCO detection task. Search space size is reported as the order of magnitude of the number of possible augmentation policies. Models are trained for 300 epochs from random initialization following [<a class="ref-link" id="c45" href="#r45">45</a>]
  • Table5: Average improvement due to each transformation. Average difference in validation accuracy (%) when a particular transformation is added to a randomly sampled set of transformations. For this ablation study, Wide-ResNet-28-2 models were trained on CIFAR-10 using RandAugment (N = 3, M = 4) with the randomly sampled set of transformations, with no other data augmentation. Note that while
Download tables as Excel
Related work
  • Data augmentation has played a central role in the training of deep vision models. On natural images, horizontal flips and random cropping or translations of the images are commonly used in classification and detection models [41, 15, 9]. On MNIST, elastic distortions across scale, position, and orientation have been applied to achieve impressive results [30, 2, 36, 29]. While previous examples augment the data while keeping it in the training set distribution, operations that do the opposite can also be effective in increasing generalization. Some methods randomly erase or add noise to patches of images for increased validation accuracy [6, 44], robustness [33, 39, 7], or both [23]. Mixup [42] is a particularly effective augmentation method on CIFAR-10 and ImageNet, where the neural network is trained on convex combinations of images and their corresponding labels.
Funding
  • Google is the sole source of funding for this work
Study subjects and analysis
data: 3
Results on object detection. Mean average precision (mAP) on COCO detection task. Search space size is reported as the order of magnitude of the number of possible augmentation policies. Models are trained for 300 epochs from random initialization following [45]. Average improvement due to each transformation. Average difference in validation accuracy (%) when a particular transformation is added to a randomly sampled set of transformations. For this ablation study, Wide-ResNet-28-2 models were trained on CIFAR-10 using RandAugment (N = 3, M = 4) with the randomly sampled set of transformations, with no other data augmentation. Note that while. Optimal magnitude of augmentation depends on the size of the model and the training set. All results report CIFAR-10 validation accuracy for Wide-ResNet model architectures [41] averaged over 20 random initializations, where N = 1. (a) Accuracy of Wide-ResNet-28-2, WideResNet-28-7, and Wide-ResNet-28-10 across varying distortion magnitudes. Models are trained for 200 epochs on 45K training set examples. Squares indicate the distortion magnitude that achieves the maximal accuracy. (b) Optimal distortion magnitude across 7 Wide-ResNet-28 architectures with varying widening parameters (k). (c) Accuracy of Wide-ResNet-28-10 for three training set sizes (1K, 4K, and 10K) across varying distortion magnitudes. Squares indicate the distortion magnitude that achieves the maximal accuracy. (d) Optimal distortion magnitude across 8 training set sizes. Dashed curves show the scaled expectation value of the distortion magnitude in the AutoAugment policy. [3]. a) demonstrates the relative gain in accuracy of a model trained across increasing distortion magnitudes (M ) for three Wide-ResNet models. The squares indicate the distortion magnitude with which achieves the highest accuracy. The results from Figure 1 (a) demonstrates a clear systematic trend across distortion magnitudes. In particular, plotting all Wide-ResNet architectures versus the optimal distortion magnitude highlights a clear monotonic trend across increasing network sizes (Figure 1 (b)). Namely, larger networks demand larger data distortions for regularization. c) demonstrates the relative gain in accuracy of Wide-ResNet-28-10 trained across increasing distortion magnitudes for varying amounts of CIFAR-10 training data. The squares indicate the distortion magnitude that achieves the highest accuracy. From Figure 1 (c) we observe that models trained on smaller training dataset sizes may gain more improvement from data augmentation (e.g. 3.0% versus 1.5% in Figure 1(c)). Furthermore, it appears that the optimal distortion magnitude is larger for models that are trained on larger datasets. At first glance, this may disagree with the expectation that smaller datasets require stronger regularization. d) demonstrates that the optimal distortion magnitude increases monotonically with training set size. One hypothesis for this counter-intuitive behavior is that aggressive data augmentation leads to a low signal-to-noise ratio in small datasets. Regardless, this trend highlights the need for increasing the strength of data augmentation on larger datasets. This reveals a shortcoming of optimizing learned augmentation policies on a proxy task since it is comprised of a subset of the training data. Namely, the learned augmentation may learn an augmentation strength more tailored to the proxy task instead of the larger task of interest

data: 3
Python code for RandAugment based on NumPy. Average performance improves when more transformations are included in RandAugment. All panels report median CIFAR-10 validation accuracy for Wide-ResNet-28-2 model architectures [41] trained with RandAugment (N = 3, M = 4) using randomly sampled subsets of transformations. No other data augmentation is included in training. Error bars indicate 30th and 70th percentile. (a) Median accuracy for randomly sampled subsets of transformations. (b) Median accuracy for subsets with and without the Rotate transformation. (c) Median accuracy for subsets with and without the translate-x transformation. (d) Median accuracy for subsets with and without the posterize transformation. Dashed curves show the accuracy of the model trained without any augmentations. a) suggests that the median validation accuracy due to RandAugment improves as the number of transformations is increased. However, even with only two transformations, RandAugment leads to more than 1% improvement in validation accuracy on average.

Reference
  • L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
    Google ScholarLocate open access versionFindings
  • D. Ciregan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 3642–3649. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.
    Findings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
    Google ScholarLocate open access versionFindings
  • T. DeVries and G. W. Taylor. Dataset augmentation in feature space. arXiv preprint arXiv:1702.05538, 2017.
    Findings
  • T. DeVries and G. W. Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
    Findings
  • N. Ford, J. Gilmer, N. Carlini, and D. Cubuk. Adversarial examples are a natural consequence of test error in noise. arXiv preprint arXiv:1901.10513, 2019.
    Findings
  • X. Gastaldi. Shake-shake regularization. arXiv preprint arXiv:1705.07485, 2017.
    Findings
  • R. Girshick, I. Radosavovic, G. Gkioxari, P. Dollár, and K. He. Detectron, 2018.
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, et al. Cnn architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp), pages 131–135. IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, B. Kingsbury, et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine, 29, 2012.
    Google ScholarLocate open access versionFindings
  • D. Ho, E. Liang, I. Stoica, P. Abbeel, and X. Chen. Population based augmentation: Efficient learning of augmentation policy schedules. arXiv preprint arXiv:1905.05393, 2019.
    Findings
  • A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
    Google ScholarFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012.
    Google ScholarLocate open access versionFindings
  • J. Lemley, S. Bazrafkan, and P. Corcoran. Smart augmentation learning an optimal data augmentation strategy. IEEE Access, 5:5858–5869, 2017.
    Google ScholarLocate open access versionFindings
  • S. Lim, I. Kim, T. Kim, C. Kim, and S. Kim. Fast autoaugment. arXiv preprint arXiv:1905.00397, 2019.
    Findings
  • C. Lin, M. Guo, C. Li, W. Wu, D. Lin, W. Ouyang, and J. Yan. Online hyper-parameter learning for auto-augmentation strategy. arXiv preprint arXiv:1905.07373, 2019.
    Findings
  • T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
    Google ScholarLocate open access versionFindings
  • T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755.
    Google ScholarLocate open access versionFindings
  • C. Liu, B. Zoph, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. arXiv preprint arXiv:1712.00559, 2017.
    Findings
  • H. Liu, K. Simonyan, and Y. Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
    Findings
  • R. G. Lopes, D. Yin, B. Poole, J. Gilmer, and E. D. Cubuk. Improving robustness without sacrificing accuracy with patch gaussian augmentation. arXiv preprint arXiv:1906.02611, 2019.
    Findings
  • Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
    Google ScholarLocate open access versionFindings
  • J. Ngiam, B. Caine, W. Han, B. Yang, Y. Chai, P. Sun, Y. Zhou, X. Yi, O. Alsharif, P. Nguyen, et al. Starnet: Targeted computation for object detection in point clouds. arXiv preprint arXiv:1908.11069, 2019.
    Findings
  • D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779, 2019.
    Findings
  • A. J. Ratner, H. Ehrenberg, Z. Hussain, J. Dunnmon, and C. Ré. Learning to compose domainspecific transformations for data augmentation. In Advances in Neural Information Processing Systems, pages 3239–3249, 2017.
    Google ScholarLocate open access versionFindings
  • B. Recht, R. Roelofs, L. Schmidt, and V. Shankar. Do imagenet classifiers generalize to imagenet? arXiv preprint arXiv:1902.10811, 2019.
    Findings
  • I. Sato, H. Nishimura, and K. Yokoi. Apac: Augmented pattern classification with neural networks. arXiv preprint arXiv:1505.03229, 2015.
    Findings
  • P. Y. Simard, D. Steinkraus, J. C. Platt, et al. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of International Conference on Document Analysis and Recognition, 2003.
    Google ScholarLocate open access versionFindings
  • J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.
    Google ScholarLocate open access versionFindings
  • C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, pages 2818–2826, 2016.
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
    Findings
  • M. Tan and Q. V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946, 2019.
    Findings
  • T. Tran, T. Pham, G. Carneiro, L. Palmer, and I. Reid. A bayesian data augmentation approach for learning deep models. In Advances in Neural Information Processing Systems, pages 2794–2803, 2017.
    Google ScholarLocate open access versionFindings
  • L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus. Regularization of neural networks using dropconnect. In International Conference on Machine Learning, pages 1058–1066, 2013.
    Google ScholarLocate open access versionFindings
  • Q. Xie, Z. Dai, E. Hovy, M.-T. Luong, and Q. V. Le. Unsupervised data augmentation. arXiv preprint arXiv:1904.12848, 2019.
    Findings
  • Y. Yamada, M. Iwamura, and K. Kise. Shakedrop regularization. arXiv preprint arXiv:1802.02375, 2018.
    Findings
  • D. Yin, R. G. Lopes, J. Shlens, E. D. Cubuk, and J. Gilmer. A fourier perspective on model robustness in computer vision. arXiv preprint arXiv:1906.08988, 2019.
    Findings
  • K. Yu, C. Sciuto, M. Jaggi, C. Musat, and M. Salzmann. Evaluating the search phase of neural architecture search. arXiv preprint arXiv:1902.08142, 2019.
    Findings
  • S. Zagoruyko and N. Komodakis. Wide residual networks. In British Machine Vision Conference, 2016.
    Google ScholarLocate open access versionFindings
  • H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
    Findings
  • X. Zhang, Q. Wang, J. Zhang, and Z. Zhong. Adversarial autoaugment. arXiv preprint arXiv:1912.11188, 2019.
    Findings
  • Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang. Random erasing data augmentation. arXiv preprint arXiv:1708.04896, 2017.
    Findings
  • B. Zoph, E. D. Cubuk, G. Ghiasi, T.-Y. Lin, J. Shlens, and Q. V. Le. Learning data augmentation strategies for object detection. arXiv preprint arXiv:1906.11172, 2019.
    Findings
  • B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments