AutoAugment: Learning Augmentation Policies from Data
arXiv: Computer Vision and Pattern Recognition, Volume abs/1805.09501, 2018.
Policy Optimizationaugmentation policyeffective techniqueCIFAR-10image processingMore(1+)
The ablation experiments indicate that even data augmentation policies that are randomly sampled from our search space can lead to improvements on CIFAR-10 over the baseline augmentation policy
In this paper, we take a closer look at data augmentation for images, and describe a simple procedure called AutoAugment to search for improved data augmentation policies. Our key insight is to create a search space of data augmentation policies, evaluating the quality of a particular policy directly on the dataset of interest. In our imp...More
- Deep neural nets are powerful machine learning systems that tend to work well when trained on massive amounts of data.
- A child model is trained with augmented data generated by applying the 5 sub-policies on the training set.
- We implement the Wide-ResNet-28-10 , Shake-Shake  and ShakeDrop  models in TensorFlow, and find the weight decay and learning rate hyperparameters that give the best validation set accuracy for regular training with baseline augmentation.
- We use a reduced subset of the ImageNet training set, with 120 classes and 6,000 samples, to search for policies.
- For models trained with AutoAugment, we use the baseline pre-processing and the policy learned on ImageNet. We find that removing the random distortions of color does not change the results for AutoAugment.
- As can be seen from the results, AutoAugment improves over the widely-used Inception Pre-processing  across a wide range of models, from ResNet-50 to the state-of-art AmoebaNets .
- To evaluate the transferability of the policy found on ImageNet, we use the same policy that is learned on ImageNet on five FGVC datasets with image size similar to ImageNet. These datasets are challenging as they have relatively small sets of training examples while having a large number of classes.
- After the policy is learned, the full model is trained for longer (e.g. 1800 epochs for Shake-Shake on CIFAR-10, and 270 epochs for ResNet-50 on ImageNet), which allows us to use more sub-policies.
- The policy learned on Wide-ResNet-40-2 and reduced CIFAR-10 leads to the improvements described on all of the other model architectures trained on full CIFAR10 and CIFAR-100.
- A policy learned on WideResNet-40-2 and reduced ImageNet leads to significant improvements on Inception v4 trained on FGVC datasets that have different data and class distributions.
- AutoAugment policies are never found to hurt the performance of models even if they are learned on a different dataset, which is not the case for Cutout on reduced SVHN (Table 2).
- We find that policies learned on data distributions closest to the target yield the best performance: when training on SVHN, using the best policy learned on reduced CIFAR-10 does slightly improve generalization accuracy compared to the baseline augmentation, but not as significantly as applying the SVHNlearned policy.
- We investigate the average validation accuracy of fully-trained Wide-ResNet28-10 models on CIFAR-10 as a function of the number of sub-policies used in training.
- The ablation experiments indicate that even data augmentation policies that are randomly sampled from our search space can lead to improvements on CIFAR-10 over the baseline augmentation policy.
- Our method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data)
- On ImageNet, we attain a Top-1 accuracy of 83.5% which is 0.4% better than the previous record of 83.1%
- On CIFAR-10, we achieve an error rate of 1.5%, which is 0.6% better than the previous state-of-theart
- For direct application, our method achieves state-of-the-art accuracy on datasets such as CIFAR-10, reduced CIFAR-10, CIFAR-100, SVHN, reduced SVHN, and ImageNet (without additional data)
- On CIFAR-10, we achieve an error rate of 1.5%, which is 0.6% better than the previous state-of-the-art 
- On SVHN, we improve the state-of-the-art error rate from 1.3%  to 1.0%
- On reduced datasets, our method achieves performance comparable to semi-supervised methods without using any unlabeled data
- On ImageNet, we achieve a Top1 accuracy of 83.5% which is 0.4% better than the previous record of 83.1%
- As can be seen from the table, we achieve an error rate of 1.5% with the ShakeDrop  model, which is 0.6% better than the state-of-theart 
- Recht et al  report that Shake-Shake (26 2x64d) + Cutout performs best on this new dataset, with an error rate of 7.0% (4.1% higher relative to error rate on the original CIFAR10 test set)
- Furthermore, PyramidNet+ShakeDrop achieves an error rate of 7.7% on the new dataset (4.6% higher relative to the original test set)
- Our best model, PyramidNet+ShakeDrop trained with AutoAugment achieves an error rate of 4.4% (2.9% higher than the error rate on the original set)
- We find the average error to be 3.0% (with a standard deviation of 0.1%), which is 0.4% worse than the result achieved with the original AutoAugment policy (see Table 2)
- The improvements exhibited by random policies are less than those shown by the AutoAugment policy (2.6% ± 0.1% vs. 3.0% ± 0.1% error rate)