Anthropic Finds LLMs Can Be Poisoned Using Small Number of Documents

InfoQ — AI, ML & Data EngineeringTuesday, November 11, 2025 at 2:00:00 PM
Anthropic Finds LLMs Can Be Poisoned Using Small Number of Documents
  • Anthropic's Alignment Science team released a study on poisoning attacks on LLM training, demonstrating that only 250 malicious examples are sufficient to create a backdoor vulnerability in LLMs. The findings suggest that these attacks become easier as models scale up, raising concerns about the security of AI systems.
  • This development is significant for Anthropic as it highlights potential vulnerabilities in their LLMs, prompting a need for enhanced security measures in AI training processes. The ability to exploit such weaknesses could undermine trust in AI technologies.
  • While no related articles were identified, the findings resonate with ongoing discussions in the AI community regarding the security and robustness of machine learning models, emphasizing the importance of safeguarding against potential threats.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about