arXiv:2511.02130v1 Announce Type: cross 
Abstract: We propose Re-FORC, an adaptive reward prediction method that, given a context, enables prediction of the expected future rewards as a function of the number of future thinking tokens. Re-FORC trains a lightweight adapter on reasoning models, demonstrating improved prediction with longer reasoning and larger models. Re-FORC enables: 1) early stopping of unpromising reasoning chains, reducing compute by 26% while maintaining accuracy, 2) optimized model and thinking length selection that achieves 4% higher accuracy at equal compute and 55% less compute at equal accuracy compared to the largest model, 3) adaptive test-time scaling, which increases accuracy by 11% in high compute regime, and 7% in low compute regime. Re-FORC allows dynamic reasoning with length control via cost-per-token thresholds while estimating computation time upfront.

Re-FORC هي طريقة مبتكرة لتوقع المكافآت التكيفية التي تعزز نماذج التفكير من خلال توقع المكافآت المستقبلية بناءً على رموز التفكير. تتيح التوقف المبكر عن سلاسل التفكير غير الفعالة، مما يؤدي إلى تقليل الحسابات بنسبة 26% مع الحفاظ على الدقة. تُظهر هذه الخطوة الإمكانيات لتفكير أكثر كفاءة في الذكاء الاصطناعي.

Re-FORC es un método innovador de predicción de recompensas adaptativas que mejora los modelos de razonamiento al predecir recompensas futuras en función de los tokens de pensamiento. Permite detener prematuramente cadenas de razonamiento ineficaces, lo que resulta en una reducción del 26 % en el cómputo sin perder precisión. Este avance demuestra el potencial para un razonamiento de IA más eficiente.

Re-FORC est une méthode innovante de prédiction de récompense adaptative qui améliore les modèles de raisonnement en prédisant les récompenses futures en fonction des tokens de réflexion. Elle permet d'arrêter rapidement les chaînes de raisonnement inefficaces, entraînant une réduction de 26 % des calculs tout en préservant la précision. Cette avancée montre le potentiel d'un raisonnement AI plus efficace.

Re-FORC is an innovative adaptive reward prediction method that enhances reasoning models by predicting future rewards based on thinking tokens. It allows for early stopping of ineffective reasoning chains, leading to a 26% reduction in compute while preserving accuracy. This advancement showcases the potential for more efficient AI reasoning.

Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning

Was this article worth reading? Share it

Ready to build your own newsroom?