Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

THE DECODERWednesday, November 19, 2025 at 3:57:04 PM
Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high
  • A recent benchmark from Artificial Analysis reveals that Google's Gemini 3 Pro leads among large language models, but it still suffers from high hallucination rates, raising concerns about its factual reliability.
  • This development is crucial for Google as it positions Gemini 3 Pro as a top performer in AI reliability, which could enhance its competitive edge in the rapidly evolving AI landscape.
  • The findings underscore ongoing challenges in AI development, particularly regarding the balance between advanced capabilities and the accuracy of information provided by these models.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
OpenAI releases GPT-5.1-Codex-Max to handle engineering tasks that span twenty-four hours
PositiveArtificial Intelligence
OpenAI has launched GPT-5.1-Codex-Max, an advanced model aimed at managing extensive context and executing complex engineering tasks that may take up to twenty-four hours. This update enhances the coding environment, allowing for improved efficiency and problem-solving capabilities in software engineering.
OpenAI debuts GPT‑5.1-Codex-Max coding model and it already completed a 24-hour task internally
PositiveArtificial Intelligence
OpenAI has introduced GPT‑5.1-Codex-Max, a new coding model available in its Codex developer environment. This model enhances AI-assisted software engineering with improved reasoning, efficiency, and interactive capabilities. It replaces GPT‑5.1-Codex as the default model, designed to handle complex software development tasks across multiple contexts.
The Google Search of AI agents? Fetch launches ASI:One and Business tier for new era of non-human web
PositiveArtificial Intelligence
Fetch AI, a startup led by Humayun Sheikh, has launched three interconnected products aimed at enhancing the ecosystem of AI agents. The new offerings include ASI:One, a personal-AI orchestration platform, Fetch Business, a verification and discovery portal for brand agents, and Agentverse, an open directory with over two million agents. This initiative seeks to establish a robust infrastructure for what Fetch describes as the 'Agentic Web.'
Deepmind CEO Hassabis: World models are the future, but the AI bubble is real
NeutralArtificial Intelligence
DeepMind CEO Demis Hassabis has emphasized the importance of world models in the future of AI, while cautioning about a potential bubble in the artificial intelligence market. With the launch of Gemini 3 Pro, Google aims to strengthen its position in AI leadership, reflecting a long-term strategy that is beginning to yield results.
Google's Gemini 3 model keeps the AI hype train going – for now
NeutralArtificial Intelligence
Google's latest AI model, Gemini 3, reportedly surpasses competitors in various benchmark tests. However, concerns about its reliability persist, raising questions about the sustainability of the current AI hype.
Elon Musk's xAI in talks to raise $15 billion at $230 billion valuation
PositiveArtificial Intelligence
Elon Musk's AI company xAI is reportedly in advanced discussions to raise $15 billion in funding, which would elevate its valuation to $230 billion, according to the Wall Street Journal. This funding round reflects the company's rapid growth and ambition in the artificial intelligence sector.
Larry Summers resigns from OpenAI's board after release of emails with Jeffrey Epstein
NegativeArtificial Intelligence
Larry Summers has resigned from the board of OpenAI following the release of his email exchanges with Jeffrey Epstein. The decision comes amid growing scrutiny over his past associations and public commitments.
Agent 365: Microsoft launches management platform for AI agents
PositiveArtificial Intelligence
Microsoft has launched Agent 365, a new platform aimed at helping organizations manage their AI agents as integral parts of their workforce. This initiative is designed to enhance operational efficiency by allowing businesses to deploy and oversee AI agents similarly to human employees.