The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

arXiv — cs.LG•Friday, November 21, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

Large Reasoning Models (LRMs) have emerged, showcasing improved reasoning capabilities, yet their underlying mechanisms and limitations remain inadequately explored. This investigation highlights the need for a deeper understanding of how these models process information.
The significance of this research lies in its potential to refine the evaluation methods for LRMs, moving beyond mere accuracy to encompass the reasoning pathways that lead to conclusions.
The exploration of reasoning models is part of a broader discourse on artificial intelligence, where understanding the cognitive processes of AI systems is crucial for their effective application across various domains, including complex problem

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Continue Readings

arXiv — cs.CV2 days ago

An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs

PositiveArtificial Intelligence

The paper discusses the challenges associated with Vision-Language Models (VLMs) in generating lengthy outputs with low information density, which leads to increased energy consumption and costs. It introduces a novel verbose-text induction attack (VTIA) that uses adversarial perturbations to optimize output token length, addressing the limitations of existing methods that merely delay the end of output without maximizing length.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

An Interpretability-Guided Framework for Responsible Synthetic Data Generation in Emotional Text

NeutralArtificial Intelligence

The article presents an interpretability-guided framework for generating synthetic data in emotional text analysis, addressing the challenges of high costs and restrictions in accessing training data. Utilizing Shapley Additive Explanations (SHAP), the framework enhances the performance of large language models (LLMs) in emotion classification, particularly for underrepresented classes. However, it notes limitations in vocabulary richness and expression complexity in synthetic texts compared to authentic social media posts.

Read full article

via arXiv — cs.LG

arXiv — cs.CL2 days ago

QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation

PositiveArtificial Intelligence

QueryGym is a new Python toolkit designed for large language model (LLM)-based query reformulation. It aims to provide a unified framework that enhances retrieval effectiveness by allowing consistent implementation, execution, and comparison of various LLM-based methods. The toolkit includes a Python API, a retrieval-agnostic interface for integration with backends like Pyserini and PyTerrier, and a centralized prompt management system.

Read full article

via arXiv — cs.CL

arXiv — cs.LG2 days ago

FlipVQA-Miner: Cross-Page Visual Question-Answer Mining from Textbooks

PositiveArtificial Intelligence

The paper introduces FlipVQA-Miner, an automated pipeline designed to extract high-quality question-answer (QA) and visual question-answer (VQA) pairs from educational documents. This method combines layout-aware OCR with large language model (LLM)-based semantic parsing, addressing the challenge of transforming raw PDFs into AI-ready supervision. Experiments demonstrate that the approach yields accurate and aligned QA/VQA pairs, enhancing the utility of educational materials for training LLMs.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs

PositiveArtificial Intelligence

The paper introduces T2T-VICL, a collaborative pipeline designed to explore cross-task visual in-context learning (VICL) using vision-language models (VLMs). It focuses on generating and selecting text prompts that effectively describe differences between distinct low-level vision tasks. The study also presents a novel inference framework that integrates perceptual reasoning with traditional evaluation metrics, aiming to enhance the capabilities of VLMs in handling diverse visual tasks.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Evolution Strategies at the Hyperscale

PositiveArtificial Intelligence

The paper introduces Evolution Guided General Optimization via Low-rank Learning (EGGROLL), an evolution strategies (ES) algorithm aimed at optimizing large neural networks with billions of parameters. EGGROLL addresses the high computational costs of traditional ES by using low-rank matrix perturbations, enabling efficient backpropagation-free optimization for large populations.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Node-Level Uncertainty Estimation in LLM-Generated SQL

PositiveArtificial Intelligence

A new framework for detecting errors in SQL generated by large language models (LLMs) has been introduced, focusing on estimating uncertainty at the node level in the query's abstract syntax tree (AST). The method employs a semantically aware labeling algorithm to assess node correctness and utilizes a supervised classifier to predict error probabilities, providing detailed diagnostics for potential query errors. This approach significantly outperforms traditional token log-probabilities across various databases.

Read full article

via arXiv — cs.LG

arXiv — cs.CL2 days ago

Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems

PositiveArtificial Intelligence

Recent advancements in Retrieval-Augmented Generation (RAG) have enabled Large Language Models (LLMs) to access multimodal knowledge bases containing both text and visual information. Current multimodal RAG systems convert images into text during preprocessing, which results in the loss of contextual information. This study presents a comparative analysis of text-based chunk retrieval and direct multimodal embedding retrieval, evaluating their effectiveness across six LLM models.

Read full article

via arXiv — cs.CL