The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023
Woo Jae Kim,
Yoonki Cho,
Junsik Jung, and
Sung-Eui Yoon
Korea Advanced Institute of Science and Technology (KAIST)
Deep neural networks are susceptible to adversarial attacks due to the accumulation of perturbations in the feature level, and numerous works have boosted model robustness by deactivating the non-robust feature activations that cause model mispredictions. However, we claim that these malicious activations still contain discriminative cues and that with recalibration, they can capture additional useful information for correct model predictions. To this end, we propose a novel, easy-to-plugin approach named Feature Separation and Recalibration (FSR) that recalibrates the malicious, non-robust activations for more robust feature maps through Separation and Recalibration. The Separation part disentangles the input feature map into the robust feature with activations that help the model make correct predictions and the non-robust feature with activations that are responsible for model mispredictions upon adversarial attack. The Recalibration part then adjusts the non-robust activations to restore the potentially useful cues for model predictions. Extensive experiments verify the superiority of FSR compared to traditional deactivation techniques and demonstrate that it improves the robustness of existing adversarial training methods by up to 8.57% with small computational overhead. Codes are available at this https URL.
Figure 1: Adversarial attacks disrupt the intermediate feature maps learned by deep neural networks, leading to mispredictions. (a) Conventional approaches have focused on suppressing or deactivating these disrupted activations, which can lead to loss of potentially useful cues that have been exploited in natural images. (b) We instead propose to restore useful cues from these disrupted activations that are otherwise neglected. These additional useful information will better guide the model to make correct predictions under attack.
Figure 2: We propose the Feature Separation and Recalibration (FSR) module to restore useful cues for predictions from disrupted feature activations. The Separation stage disentangles the input feature into the robust feature responsible for correct model predictions and the non-robust feature responsible for model misprediction. Then, the Recalibration stage recalibrates the non-robust feature into the recalibrated feature to restore useful cues for correct model predictions. The combined output feature of the robust and recalibrated features are passed down to subsequent layers of the model. FSR is attachable to any CNN model and can be trained with any adversarial training technique in an end-to-end manner.
Figure 3: We design the Separation Net to learn the robustness of each feature activation based on their relevant to correct prediction. We activation-wise disentangle the input feature into the robust feature and the non-robust feature based on this robustness.
Figure 4: We design the Recalibration Net to learn the recalibrating units that restore useful cues for correct predictions from the non-robust feature.
Table 1: Robustness (accuracy (%)) of various adversarial training strategies (AT, TRADES, and MART) upon adding our FSR module (+ FSR). We test method on ResNet-18 under CIFAR-10 and SVHN datasets. Our FSR module consistently improves the robustness against various attacks. Better results are marked in bold. Please refer to our paper for more results on other models (VGG16, WideResNet-34-10) and datasets (CIFAR-100, Tiny ImageNet).
Figure 5: Visualization of attention maps on the features of natural images (Natural) and the robust, non-robust, and the recalibrated feature of the adversarial images. The robust, feature captures discriminative cues regarding the ground truth class, while the non-robust feature captures irrelevant cues. To further boost feature robustness, we recalibrate the non-robust and capture additional useful cues for model predictions.