FACTS benchmark shows that even top AI models struggle with the truth

THE DECODER•Thursday, December 11, 2025 at 4:37:14 PM

NegativeArtificial Intelligence

FACTS benchmark shows that even top AI models struggle with the truth

Google DeepMind has introduced a new benchmark called FACTS, designed to assess the reliability of AI models more thoroughly. The results indicate that even leading models like Gemini 3 Pro and GPT-5.1 exhibit significant shortcomings in factual accuracy, highlighting the challenges these technologies face in delivering truthful information.
This development is crucial for Google as it seeks to enhance the credibility and performance of its AI offerings. The FACTS benchmark aims to address the growing concerns regarding the reliability of AI outputs, which is essential for maintaining user trust and advancing AI applications in various sectors.
The introduction of the FACTS benchmark underscores a broader industry trend towards prioritizing factual accuracy in AI systems. As companies like Google and OpenAI compete to improve their models, the persistent issues of hallucination and misinformation remain critical challenges, prompting ongoing discussions about the ethical implications and responsibilities of AI developers.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Airparser

Extract and parse data from documents using GPT-4 automation.

AI & DataView app details

Humanize AI

Transform AI-generated text into undetectable, human-like content effortlessly.

Business & ProductivityView app details

ZeroGPT.org

Detect AI-generated text and check for plagiarism with accurate, reliable results.

AI & DataView app details

Continue Readings

TechCrunch8 hours ago

Google launched its deepest AI research agent yet — on the same day OpenAI dropped GPT-5.2

PositiveArtificial Intelligence

Google has launched its most advanced AI research agent to date, based on the Gemini 3 Pro model, allowing developers to integrate this tool into their applications. This release coincided with OpenAI's introduction of GPT-5.2, marking a significant moment in the competitive AI landscape.

Read full article

via TechCrunch

THE DECODER14 hours ago

Google opens updated Deep Research Agent to developers with new API

PositiveArtificial Intelligence

Google has launched an updated version of its Deep Research Agent, making it accessible to developers for the first time through a new API. This release is part of Google's ongoing efforts to enhance its AI capabilities and improve complex web search functionalities with an open-source benchmark.

Read full article

via THE DECODER

THE DECODER15 hours ago

AI in space requires new cooling tech and cheap rockets

NeutralArtificial Intelligence

The increasing energy demands of modern AI models are prompting tech companies to explore space-based solutions, necessitating advancements in cooling technologies and affordable rocket launches. This shift reflects a long-term vision among industry leaders to harness the unique advantages of space for AI applications.

Read full article

via THE DECODER

VentureBeat — AIa day ago

The 70% factuality ceiling: why Google’s new ‘FACTS’ benchmark is a wake-up call for enterprise AI

NeutralArtificial Intelligence

Google has introduced a new benchmark called 'FACTS' aimed at measuring the factual accuracy of generative AI models, addressing a critical gap in existing benchmarks that focus primarily on task completion rather than the truthfulness of the information generated. This initiative is particularly significant for industries where accuracy is essential, such as legal, finance, and medical sectors.

Read full article

via VentureBeat — AI

THE DECODERa day ago

Google adds new features to boost website visibility in AI search

PositiveArtificial Intelligence

Read full article

via THE DECODER

THE DECODER2 days ago

Deepseek reportedly using thousands of smuggled Nvidia chips for AI training

NegativeArtificial Intelligence

Deepseek, a Chinese AI startup, is reportedly using thousands of smuggled Nvidia chips for training its next major AI model, raising concerns about compliance with international trade regulations. This information comes from sources cited by The Information, highlighting the lengths to which the company is going to enhance its technological capabilities.

Read full article

via THE DECODER

THE DECODER2 days ago

Aviation startup Boom pivots to gas turbines to feed AI’s power hunger

NeutralArtificial Intelligence

US aviation startup Boom Supersonic is shifting its focus from developing a supersonic passenger jet to entering the energy sector by creating gas turbines to meet the growing power demands of artificial intelligence (AI). This pivot aims to capitalize on the increasing energy needs driven by AI advancements.

Read full article

via THE DECODER

Analytics India Magazine2 days ago

Google Rolls Out AI Plus Subscription in India at ₹399 Per Month

PositiveArtificial Intelligence

Google has launched its AI Plus subscription service in India, priced at ₹399 per month, as part of its ongoing efforts to enhance user engagement with its AI technologies. This initiative aligns with the company's strategy to expand its AI offerings and reach a broader audience in the Indian market.

Read full article

via Analytics India Magazine