Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification

arXiv — cs.CLWednesday, November 12, 2025 at 5:00:00 AM
The introduction of the Manifold-Correcting Causal Flow (MC^2F) represents a significant advancement in the field of text classification, particularly in overcoming the longstanding adversarial robustness-performance trade-off. Traditional approaches often compromise clean data performance to enhance robustness against adversarial attacks. However, MC^2F utilizes a two-module system that models the distribution of clean samples in the encoder embedding manifold, effectively correcting out-of-distribution embeddings. Extensive evaluations across three datasets demonstrated that this method not only establishes a new state-of-the-art in adversarial robustness but also fully preserves performance on clean data, with modest gains in accuracy. This breakthrough is crucial for developing more reliable text classification systems that can withstand adversarial challenges while maintaining high performance, thus enhancing their applicability in various real-world scenarios.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Networks
PositiveArtificial Intelligence
The article discusses the evaluation of Deep Neural Networks (DNNs) based on their generalization performance and robustness against adversarial attacks. It highlights the challenges in assessing DNNs solely through generalization metrics as their performance has reached state-of-the-art levels. The study introduces the concept of the Populated Region Set (PRS) to analyze the internal properties of DNNs that influence their robustness, revealing that a low PRS ratio correlates with improved adversarial robustness.
Learn to Select: Exploring Label Distribution Divergence for In-Context Demonstration Selection in Text Classification
PositiveArtificial Intelligence
The article discusses a novel approach to in-context learning (ICL) for text classification, emphasizing the importance of selecting appropriate demonstrations. Traditional methods often prioritize semantic similarity, neglecting label distribution alignment, which can impact performance. The proposed method, TopK + Label Distribution Divergence (L2D), utilizes a fine-tuned BERT-like small language model to generate label distributions and assess their divergence. This dual focus aims to enhance the effectiveness of demonstration selection in large language models (LLMs).