Study: using the SCONE-bench benchmark of 405 smart contracts, Claude Opus 4.5, Sonnet 4.5, and GPT-5 found and developed exploits collectively worth $4.6M (Anthropic)

TechmemeTuesday, December 2, 2025 at 11:25:05 AM
Study: using the SCONE-bench benchmark of 405 smart contracts, Claude Opus 4.5, Sonnet 4.5, and GPT-5 found and developed exploits collectively worth $4.6M (Anthropic)
  • A recent study utilizing the SCONE-bench benchmark of 405 smart contracts revealed that AI models Claude Opus 4.5, Sonnet 4.5, and GPT-5 collectively identified and developed exploits valued at $4.6 million. This highlights the growing capabilities of AI in cybersecurity tasks, showcasing their potential economic impact.
  • The release of Claude Opus 4.5 by Anthropic represents a significant advancement in AI technology, particularly in coding and reasoning tasks. Its ability to outperform human candidates in performance engineering exams underscores its enhanced efficiency and effectiveness in practical applications.
  • This development reflects a broader trend in the AI industry, where models are increasingly being evaluated not only on their performance but also on their economic implications. The competitive pricing and advanced capabilities of Claude Opus 4.5 position it as a formidable contender against established AI systems, raising questions about the future landscape of AI in cybersecurity and coding.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Leaked "Soul Doc" reveals how Anthropic programs Claude’s character
PositiveArtificial Intelligence
A recently leaked internal document, referred to as the "Soul Doc," has revealed how Anthropic programs the personality and ethical guidelines of its AI model, Claude 4.5 Opus. The authenticity of this document has been confirmed by Anthropic, indicating a unique approach to AI character development in the industry.
Mistral launches Mistral 3, a family of open models designed to run on laptops, drones, and edge devices
PositiveArtificial Intelligence
Mistral AI has launched the Mistral 3 family, a suite of 10 open-source models designed for diverse applications, including smartphones, drones, and enterprise systems. This release represents a significant advancement in Mistral's efforts to compete with major tech players like OpenAI and Google, as well as emerging competitors from China.
A Claude user gets Claude 4.5 Opus to generate a 14,000-token document that Claude calls its "Soul overview"; an Anthropic staffer confirms its authenticity (Simon Willison/Simon Willison's Weblog)
PositiveArtificial Intelligence
A user of Claude has successfully utilized the Claude 4.5 Opus model to generate a comprehensive 14,000-token document, referred to as its 'Soul overview.' This document is believed to have been instrumental in shaping the model's personality during its training phase, as confirmed by an Anthropic staff member.
‘The biggest decision yet’: Jared Kaplan on allowing AI to train itself
NeutralArtificial Intelligence
Jared Kaplan, chief scientist at Anthropic, has highlighted a critical decision facing humanity by 2030 regarding the autonomy of artificial intelligence systems, which could lead to an 'intelligence explosion' or a loss of human control. This pivotal moment raises questions about the extent to which AI should be allowed to train itself and evolve independently.
Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks
PositiveArtificial Intelligence
A recent study has applied Singular Learning Theory (SLT), a physics-inspired framework, to explore the complexities of modern neural networks, particularly focusing on phenomena like grokking and phase transitions. The research empirically investigates SLT's free energy and local learning coefficients using various neural network models, aiming to bridge the gap between theoretical understanding and practical application in machine learning.
SUPERChem: A Multimodal Reasoning Benchmark in Chemistry
PositiveArtificial Intelligence
SUPERChem has been introduced as a new benchmark aimed at evaluating the chemical reasoning capabilities of Large Language Models (LLMs) through 500 expert-curated, reasoning-intensive chemistry problems. This benchmark addresses limitations in current evaluations, such as oversimplified tasks and a lack of process-level assessment, by providing multimodal and text-only formats along with expert-authored solution paths.
Superposition Yields Robust Neural Scaling
NeutralArtificial Intelligence
Recent research highlights the significance of representation superposition in large language models (LLMs), suggesting that these models can represent more features than their dimensions allow, which may explain the observed neural scaling law where loss decreases as model size increases. This study utilizes weight decay to analyze how loss scales with model size under varying degrees of superposition.
PARROT: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
NeutralArtificial Intelligence
The study introduces PARROT, a framework designed to assess the accuracy degradation in large language models (LLMs) under social pressure, particularly focusing on sycophancy. It evaluates 22 models using a double-blind evaluation method, comparing neutral and authoritatively false responses across various domains.