From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models

arXiv — cs.CLMonday, November 17, 2025 at 5:00:00 AM
The article discusses the limitations of Tool-augmented Language Models (TaLMs) in reasoning tasks. It highlights a phenomenon termed Tool-Induced Myopia (TIM), where TaLMs, despite achieving a 19.3 percentage point gain in final-answer accuracy, produce solutions that lack coherent justification. This study utilizes the PYMATH benchmark, consisting of 1,679 mathematical problems, to demonstrate that TaLMs often treat tool outputs as substitutes for reasoning, leading to a deterioration in their reasoning behavior.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it