Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion

arXiv — cs.CLWednesday, November 19, 2025 at 5:00:00 AM
  • Transformer models in NLP are susceptible to backdoor attacks that utilize poisoned data to embed hidden behaviors during training, as revealed by recent research. The introduction of SteganoBackdoor aims to address this vulnerability by employing natural
  • The implications of this development are significant, as it underscores the need for improved defenses against backdoor attacks in NLP systems. By focusing on semantic triggers, the research highlights the potential for real
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Soft-Label Training Preserves Epistemic Uncertainty
PositiveArtificial Intelligence
The article discusses the concept of soft-label training in machine learning, which preserves epistemic uncertainty by treating annotation distributions as ground truth. Traditional methods often collapse diverse human judgments into single labels, leading to misalignment between model certainty and human perception. Empirical results show that soft-label training reduces KL divergence from human annotations by 32% and enhances correlation between model and annotation entropy by 61%, while maintaining accuracy comparable to hard-label training.
Anti-adversarial Learning: Desensitizing Prompts for Large Language Models
PositiveArtificial Intelligence
The paper discusses the importance of privacy preservation in user prompts for large language models (LLMs), highlighting the risks of exposing sensitive data. Traditional privacy techniques face limitations due to computational costs and user participation. The authors introduce PromptObfus, a method that employs anti-adversarial learning to obscure sensitive terms in prompts while maintaining model prediction stability. This approach uses a masked language modeling task to replace privacy-sensitive words with a [MASK] token, aiming to enhance user privacy.
Bias in, Bias out: Annotation Bias in Multilingual Large Language Models
NeutralArtificial Intelligence
Annotation bias in NLP datasets poses significant challenges for the development of multilingual Large Language Models (LLMs), especially in culturally diverse contexts. Factors such as task framing, annotator subjectivity, and cultural mismatches can lead to distorted model outputs and increased social harms. A comprehensive framework is proposed to understand annotation bias, which includes instruction bias, annotator bias, and contextual and cultural bias. The article reviews detection methods and suggests mitigation strategies.
Theories of "Sexuality" in Natural Language Processing Bias Research
NeutralArtificial Intelligence
Recent advancements in Natural Language Processing (NLP) have led to the widespread use of language models, prompting research into the reflection and amplification of social biases, including gender and racial bias. However, there is a notable gap in the analysis of how queer sexualities are represented in NLP systems. A survey of 55 articles reveals that sexuality is often poorly defined, relying on normative assumptions about sexual and romantic identities, which raises concerns about the operationalization of sexuality in NLP bias research.
FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA
PositiveArtificial Intelligence
The article presents FedALT, a new algorithm for federated fine-tuning of large language models (LLMs) that addresses the challenges of cross-client interference and data heterogeneity. Traditional methods, primarily based on FedAvg, often lead to suboptimal personalization due to model aggregation issues. FedALT allows each client to continue training its individual LoRA while integrating knowledge from a separate Rest-of-World (RoW) LoRA component. This approach includes an adaptive mixer to balance local adaptation with global information effectively.