Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities

arXiv — cs.CL•Thursday, November 20, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

Recent evaluations of Large Reasoning Models (LRMs) indicate that while they excel in specialized reasoning tasks, the incorporation of deliberative reasoning capabilities compromises their foundational abilities, resulting in decreased helpfulness and increased costs.
This development is crucial for companies like OpenAI and DeepSeek, as it underscores the trade
The findings reflect ongoing challenges in AI, particularly the balance between complex reasoning and practical usability, as other studies also explore the limitations of LLMs in various contexts, emphasizing the need for adaptive strategies.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

Analytics India Magazine4 hours ago

Indian IT Faces Threat From AI Coding Tools from OpenAI & Google, Says IndiaAI CEO

NegativeArtificial Intelligence

At the Bengaluru Tech Summit, IndiaAI CEO Singh highlighted the growing threat posed by AI coding tools from OpenAI and Google to India's software services sector. He emphasized that these advancements could undermine India's competitive edge in IT services, which has been a significant contributor to the country's economy.

Read full article

via Analytics India Magazine

DEV Community5 hours ago

Deterministic RAG: A Drop-in Replacement for GraphRAG’s Unstable Planning

PositiveArtificial Intelligence

The article discusses the development of a deterministic RAG (Retrieval-Augmented Generation) system designed to replace GraphRAG's unstable planning. Current RAG systems face issues with reproducibility and debugging due to their reliance on LLM-driven dynamic planning. The new deterministic approach aims to enhance stability and auditability while maintaining the system's generative capabilities.

Read full article

via DEV Community

arXiv — cs.CL6 hours ago

ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions

NeutralArtificial Intelligence

ConInstruct is a benchmark designed to evaluate Large Language Models (LLMs) on their ability to detect and resolve conflicts in user instructions. While many existing assessments focus on adherence to instructions, ConInstruct addresses the often-overlooked scenarios where conflicting constraints arise. Initial evaluations show that proprietary LLMs generally perform well in conflict detection, with DeepSeek-R1 and Claude-4.5-Sonnet achieving the highest F1-scores.

Read full article

via arXiv — cs.CL

arXiv — cs.CV6 hours ago

Spot The Ball: A Benchmark for Visual Social Inference

NeutralArtificial Intelligence

The article introduces 'Spot The Ball', a benchmark designed to evaluate visual social inference in vision-language models (VLMs) using sports imagery. The task involves localizing a missing sports ball in images from soccer, basketball, and volleyball. The study compares human performance against four advanced VLMs, revealing that humans are significantly more accurate, achieving 20-34% accuracy compared to the models' maximum of 17%. This highlights the limitations of current AI in understanding complex visual cues.

Read full article

via arXiv — cs.CV

arXiv — cs.LG6 hours ago

GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning

PositiveArtificial Intelligence

The paper presents Group Relative Policy Optimization for Representation Model (GRPO-RM), a reinforcement learning method aimed at fine-tuning large language models (LLMs). It establishes a predefined output set to replace token sequence sampling, facilitating the generation of an output group essential for GRPO's optimization. A specialized reward function is also introduced to cater to representation models, with extensive experiments validating the method's effectiveness across various real-world datasets.

Read full article

via arXiv — cs.LG

arXiv — cs.CL6 hours ago

Investigating Hallucination in Conversations for Low Resource Languages

NeutralArtificial Intelligence

Large Language Models (LLMs) have shown exceptional ability in text generation but often produce factually incorrect statements, known as 'hallucinations'. This study investigates hallucinations in conversational data across three low-resource languages: Hindi, Farsi, and Mandarin. The analysis of various LLMs, including GPT-3.5 and GPT-4o, reveals that while Mandarin has few hallucinated responses, Hindi and Farsi exhibit significantly higher rates of inaccuracies.

Read full article

via arXiv — cs.CL

Futurism — AI12 hours ago

OpenAI Board Member Resigns After Deep Connections to Epstein Exposed

NegativeArtificial Intelligence

Larry Summers has resigned from the board of OpenAI following the exposure of his deep connections with Jeffrey Epstein. Summers acknowledged his association with Epstein as a significant error in judgment. This resignation comes amid growing scrutiny over his past associations and public commitments.

Read full article

via Futurism — AI

Engadget15 hours ago

OpenAI made a free version of ChatGPT for teachers

NeutralArtificial Intelligence

OpenAI has launched a free version of ChatGPT specifically designed for teachers. This initiative aims to provide educators with accessible tools to enhance their teaching methods and engage students more effectively. The move is part of OpenAI's broader strategy to support educational professionals in leveraging AI technology.

Read full article

via Engadget