Deep neural networks are susceptible to adversarial attacks due to the accumulation of perturbations in the
feature level, and numerous works have boosted model robustness by deactivating the non-robust feature
activations that cause model mispredictions. However, we claim that these malicious activations still
contain discriminative cues and that with recalibration, they can capture additional useful information for
correct model predictions. To this end, we propose a novel, easy-to-plugin approach named Feature Separation
and Recalibration (FSR) that recalibrates the malicious, non-robust activations for more robust feature maps
through Separation and Recalibration. The Separation part disentangles the input feature map into the robust
feature with activations that help the model make correct predictions and the non-robust feature with
activations that are responsible for model mispredictions upon adversarial attack. The Recalibration part
then adjusts the non-robust activations to restore the potentially useful cues for model predictions.
Extensive experiments verify the superiority of FSR compared to traditional deactivation techniques and
demonstrate that it improves the robustness of existing adversarial training methods by up to 8.57% with
small computational overhead. Codes are available at this https
Figure 1: Adversarial attacks disrupt the intermediate feature maps learned by deep neural networks, leading
(a) Conventional approaches have focused on suppressing or deactivating these disrupted activations, which
lead to loss of potentially useful cues that have been exploited in natural images.
(b) We instead propose to restore useful cues from these disrupted activations that are otherwise neglected.
These additional useful information will better guide the model to make correct predictions under attack.
Feature Separation and Recalibration (FSR)
Figure 2: We propose the Feature Separation and Recalibration (FSR) module to restore useful cues for
predictions from disrupted feature activations.
The Separation stage disentangles the input feature
into the robust feature
responsible for correct model predictions and the non-robust feature
responsible for model misprediction.
Then, the Recalibration stage recalibrates the non-robust feature into the recalibrated feature
to restore useful cues for correct model predictions. The combined output feature
of the robust and recalibrated features are passed down to subsequent layers of the model.
FSR is attachable to any CNN model and can be trained with any adversarial training technique in an
Figure 3: We design the Separation Net
to learn the robustness of each feature activation based on their relevant to correct
We activation-wise disentangle the input feature into the robust feature and the non-robust
feature based on this robustness.
Figure 4: We design the Recalibration Net
to learn the recalibrating units that restore useful cues for correct predictions from the
Table 1: Robustness (accuracy (%)) of various adversarial training strategies (AT, TRADES, and MART) upon adding our
FSR module (+ FSR).
We test method on ResNet-18 under CIFAR-10 and SVHN datasets.
Our FSR module consistently improves the robustness against various attacks.
Better results are marked in bold.
Please refer to our paper for more results on other models (VGG16, WideResNet-34-10) and datasets
(CIFAR-100, Tiny ImageNet).
Figure 5: Visualization of attention maps on the features of natural images (Natural) and the robust, non-robust, and
the recalibrated feature of the adversarial images.
The robust, feature captures discriminative cues regarding the
ground truth class, while the non-robust feature captures
To further boost feature robustness, we recalibrate the non-robust
and capture additional useful cues for model predictions.