arXiv:2511.08052v1 Announce Type: cross 
Abstract: Recent LLMs have demonstrated sophisticated problem-solving capabilities on various benchmarks through advanced reasoning algorithms. However, the key research question of identifying reasoning steps that balance complexity and computational efficiency remains unsolved. Recent research has increasingly drawn upon psychological theories to explore strategies for optimizing cognitive pathways. The LLM's final outputs and intermediate steps are regarded as System 1 and System 2, respectively. However, an in-depth exploration of the System 2 reasoning is still lacking. Therefore, we propose a novel psychologically backed Scaffold Reasoning framework for code debugging, which encompasses the Scaffold Stream, Analytic Stream, and Integration Stream. The construction of reference code within the Scaffold Stream is integrated with the buggy code analysis results produced by the Analytic Stream through the Integration Stream. Our framework achieves an 88.91% pass rate and an average inference time of 5.36 seconds per-problem on DebugBench, outperforming other reasoning approaches across various LLMs in both reasoning accuracy and efficiency. Further analyses elucidate the advantages and limitations of various cognitive pathways across varying problem difficulties and bug types. Our findings also corroborate the alignment of the proposed Scaffold Reasoning framework with human cognitive processes.

تم اقتراح إطار جديد لتصحيح الشيفرة، يسمى "الاستدلال ذو العمليتين مع السقالات"، والذي حقق معدل نجاح يبلغ 88.91٪ ومتوسط زمن استدلال قدره 5.36 ثانية. يدمج هذا الإطار النظريات النفسية لتحسين عمليات الاستدلال، مع التركيز بشكل خاص على استدلال النظام 2، الذي لم يتم استكشافه بشكل كافٍ. تتفوق أداؤه على الأساليب الحالية، مما يمثل تقدمًا كبيرًا في كفاءة ودقة نماذج اللغة الكبيرة (LLMs) في مهام التصحيح.

Se ha propuesto un nuevo marco para la depuración de código, denominado Razonamiento de Escalera de Doble Proceso, que logra una tasa de éxito del 88.91% y un tiempo medio de inferencia de 5.36 segundos. Este marco integra teorías psicológicas para mejorar los procesos de razonamiento, centrándose especialmente en el razonamiento del Sistema 2, que ha sido poco explorado. Su rendimiento supera a los métodos existentes, marcando un avance significativo en la eficiencia y precisión de los modelos de lenguaje de gran tamaño (LLMs) en tareas de depuración.

Un nouveau cadre pour le débogage de code, appelé Raisonnement à double processus avec échafaudage, a été proposé, atteignant un taux de réussite de 88,91 % et un temps d'inférence moyen de 5,36 secondes. Ce cadre intègre des théories psychologiques pour améliorer les processus de raisonnement, en se concentrant particulièrement sur le raisonnement de Système 2, qui a été peu exploré. Ses performances surpassent les méthodes existantes, marquant une avancée significative dans l'efficacité et la précision des modèles de langage de grande taille (LLMs) dans les tâches de débogage.

A new framework for code debugging, called Dual-Process Scaffold Reasoning, has been proposed, achieving an 88.91% pass rate and an average inference time of 5.36 seconds. This framework integrates psychological theories to enhance reasoning processes, particularly focusing on System 2 reasoning, which has been underexplored. Its performance surpasses existing methods, marking a significant advancement in the efficiency and accuracy of large language models (LLMs) in debugging tasks.

Dual-Process Scaffold Reasoning for Enhancing LLM Code Debugging

Was this article worth reading? Share it

Ready to build your own newsroom?