I compared Claude Opus 4.8 with 4.7 in a 10-round honesty test - and a legal prompt broke it

ZDNET — Artificial IntelligenceWednesday, June 3, 2026 at 1:07:00 AM
  • What Happened

    A recent comparison of Claude Opus 4.8 and its predecessor 4.7 revealed that the latest model struggled with a legal prompt during a 10-round honesty test, highlighting potential weaknesses in its performance. The evaluation included various challenges across coding, medical, finance, and legal domains, with results cross-checked against multiple AI systems.

  • Why It Matters

    This development is significant for Anthropic, the company behind Claude Opus, as it underscores the importance of model reliability and transparency, particularly in sensitive areas like legal advice where accuracy is paramount.

  • The Bigger Picture

    The findings reflect ongoing discussions in the AI community regarding the balance between enhanced capabilities and the risks of over-cautiousness or inaccuracies in AI responses, as seen in previous versions like 4.7, which faced criticism for being overly restrictive in its safeguards.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about