From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models
NegativeArtificial Intelligence
The article discusses the limitations of Tool-augmented Language Models (TaLMs) in reasoning tasks. It highlights a phenomenon termed Tool-Induced Myopia (TIM), where TaLMs, despite achieving a 19.3 percentage point gain in final-answer accuracy, produce solutions that lack coherent justification. This study utilizes the PYMATH benchmark, consisting of 1,679 mathematical problems, to demonstrate that TaLMs often treat tool outputs as substitutes for reasoning, leading to a deterioration in their reasoning behavior.
— via World Pulse Now AI Editorial System