Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

arXiv — cs.LGThursday, November 13, 2025 at 5:00:00 AM
Backdoor attacks represent a critical challenge for machine learning, causing models to misclassify poisoned data while performing normally on clean inputs. Traditional defenses struggle with precision in identifying backdoor neurons due to inaccurate TAC estimations. A novel method has been introduced that reconstructs TAC values more accurately, framing the problem as a convex quadratic optimization task. This allows for better identification of poisoned classes and effective fine-tuning to eliminate backdoors. Experiments conducted on CIFAR-10, GTSRB, and TinyImageNet demonstrate that this new approach consistently outperforms existing methods, achieving both high clean accuracy and effective backdoor suppression. The implications of this research are significant, as it enhances the robustness of machine learning models against sophisticated attacks, thereby improving their reliability in real-world applications.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Enhanced Structured Lasso Pruning with Class-wise Information
PositiveArtificial Intelligence
The paper titled 'Enhanced Structured Lasso Pruning with Class-wise Information' discusses advancements in neural network pruning methods. Traditional pruning techniques often overlook class-wise information, leading to potential loss of statistical data. This study introduces two new pruning schemes, sparse graph-structured lasso pruning with Information Bottleneck (sGLP-IB) and sparse tree-guided lasso pruning with Information Bottleneck (sTLP-IB), aimed at preserving statistical information while reducing model complexity.
AMUN: Adversarial Machine UNlearning
PositiveArtificial Intelligence
The paper titled 'AMUN: Adversarial Machine UNlearning' discusses a novel method for machine unlearning, which allows users to delete specific datasets to comply with privacy regulations. Traditional exact unlearning methods require significant computational resources, while approximate methods have not achieved satisfactory accuracy. The proposed Adversarial Machine UNlearning (AMUN) technique enhances model performance by fine-tuning on adversarial examples, effectively reducing model confidence on forgotten samples while maintaining accuracy on test datasets.
Orthogonal Soft Pruning for Efficient Class Unlearning
PositiveArtificial Intelligence
The article discusses FedOrtho, a federated unlearning framework designed to enhance data unlearning in federated learning environments. It addresses the challenges of balancing forgetting and retention, particularly in non-IID settings. FedOrtho employs orthogonalized deep convolutional kernels and a one-shot soft pruning mechanism, achieving state-of-the-art performance on datasets like CIFAR-10 and TinyImageNet, with over 98% forgetting quality and 97% retention accuracy.
On the Necessity of Output Distribution Reweighting for Effective Class Unlearning
PositiveArtificial Intelligence
The paper titled 'On the Necessity of Output Distribution Reweighting for Effective Class Unlearning' identifies a critical flaw in class unlearning evaluations, specifically the neglect of class geometry, which can lead to privacy breaches. It introduces a membership-inference attack via nearest neighbors (MIA-NN) to identify unlearned samples. The authors propose a new fine-tuning objective that adjusts the model's output distribution to mitigate privacy risks, demonstrating that existing unlearning methods are susceptible to MIA-NN across various datasets.
PrivDFS: Private Inference via Distributed Feature Sharing against Data Reconstruction Attacks
PositiveArtificial Intelligence
The paper introduces PrivDFS, a distributed feature-sharing framework designed for input-private inference in image classification. It addresses vulnerabilities in split inference that allow Data Reconstruction Attacks (DRAs) to reconstruct inputs with high fidelity. By fragmenting the intermediate representation and processing these fragments independently across a majority-honest set of servers, PrivDFS limits the reconstruction capability while maintaining accuracy within 1% of non-private methods.