Wordreg: Mitigating the Gap between Training and Inference with Worst-Case Drop Regularization

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览9
Dropout has emerged as one of the most frequently used techniques for training deep neural networks (DNNs). Although effective, the sampled sub-model by random dropout during training is inconsistent with the full model (without dropout) during inference. To mitigate this undesirable gap, we propose WordReg, a simple yet effective regularization built on dropout that enforces the consistency between the outputs of different sub-models sampled by dropout. Specifically, WordReg first obtains the worst-case dropout by maximizing the divergence between the outputs with two sub-models with different random dropouts. And then, it encourages the agreements between the outputs of the two sub-models with worstcase divergence. Extensive experiments on diverse DNNs and tasks reveal that WordReg can achieve notable and consistent improvements over non-regularized models and yields some state-of-the-art results. Theoretically, we verify that WordReg can reduce the gap between training and inference.
Image Recognition,Language Understanding,Graph Mining,Dropout,Regularization
AI 理解论文
Chat Paper