Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities

arXiv — cs.CLThursday, November 20, 2025 at 5:00:00 AM
  • Recent evaluations of Large Reasoning Models (LRMs) indicate that while they excel in specialized reasoning tasks, the incorporation of deliberative reasoning capabilities compromises their foundational abilities, resulting in decreased helpfulness and increased costs.
  • This development is crucial for companies like OpenAI and DeepSeek, as it underscores the trade
  • The findings reflect ongoing challenges in AI, particularly the balance between complex reasoning and practical usability, as other studies also explore the limitations of LLMs in various contexts, emphasizing the need for adaptive strategies.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Indian IT Faces Threat From AI Coding Tools from OpenAI & Google, Says IndiaAI CEO
NegativeArtificial Intelligence
At the Bengaluru Tech Summit, IndiaAI CEO Singh highlighted the growing threat posed by AI coding tools from OpenAI and Google to India's software services sector. He emphasized that these advancements could undermine India's competitive edge in IT services, which has been a significant contributor to the country's economy.
Deterministic RAG: A Drop-in Replacement for GraphRAG’s Unstable Planning
PositiveArtificial Intelligence
The article discusses the development of a deterministic RAG (Retrieval-Augmented Generation) system designed to replace GraphRAG's unstable planning. Current RAG systems face issues with reproducibility and debugging due to their reliance on LLM-driven dynamic planning. The new deterministic approach aims to enhance stability and auditability while maintaining the system's generative capabilities.
ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions
NeutralArtificial Intelligence
ConInstruct is a benchmark designed to evaluate Large Language Models (LLMs) on their ability to detect and resolve conflicts in user instructions. While many existing assessments focus on adherence to instructions, ConInstruct addresses the often-overlooked scenarios where conflicting constraints arise. Initial evaluations show that proprietary LLMs generally perform well in conflict detection, with DeepSeek-R1 and Claude-4.5-Sonnet achieving the highest F1-scores.
Spot The Ball: A Benchmark for Visual Social Inference
NeutralArtificial Intelligence
The article introduces 'Spot The Ball', a benchmark designed to evaluate visual social inference in vision-language models (VLMs) using sports imagery. The task involves localizing a missing sports ball in images from soccer, basketball, and volleyball. The study compares human performance against four advanced VLMs, revealing that humans are significantly more accurate, achieving 20-34% accuracy compared to the models' maximum of 17%. This highlights the limitations of current AI in understanding complex visual cues.
GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning
PositiveArtificial Intelligence
The paper presents Group Relative Policy Optimization for Representation Model (GRPO-RM), a reinforcement learning method aimed at fine-tuning large language models (LLMs). It establishes a predefined output set to replace token sequence sampling, facilitating the generation of an output group essential for GRPO's optimization. A specialized reward function is also introduced to cater to representation models, with extensive experiments validating the method's effectiveness across various real-world datasets.
Investigating Hallucination in Conversations for Low Resource Languages
NeutralArtificial Intelligence
Large Language Models (LLMs) have shown exceptional ability in text generation but often produce factually incorrect statements, known as 'hallucinations'. This study investigates hallucinations in conversational data across three low-resource languages: Hindi, Farsi, and Mandarin. The analysis of various LLMs, including GPT-3.5 and GPT-4o, reveals that while Mandarin has few hallucinated responses, Hindi and Farsi exhibit significantly higher rates of inaccuracies.
OpenAI Board Member Resigns After Deep Connections to Epstein Exposed
NegativeArtificial Intelligence
Larry Summers has resigned from the board of OpenAI following the exposure of his deep connections with Jeffrey Epstein. Summers acknowledged his association with Epstein as a significant error in judgment. This resignation comes amid growing scrutiny over his past associations and public commitments.
OpenAI made a free version of ChatGPT for teachers
NeutralArtificial Intelligence
OpenAI has launched a free version of ChatGPT specifically designed for teachers. This initiative aims to provide educators with accessible tools to enhance their teaching methods and engage students more effectively. The move is part of OpenAI's broader strategy to support educational professionals in leveraging AI technology.