arXiv:2511.02690v1 Announce Type: new 
Abstract: Training agents to operate under strict constraints during deployment, such as limited resource budgets or stringent safety requirements, presents significant challenges, especially when these constraints render the task complex. In this work, we propose a curriculum learning strategy that gradually tightens constraints during training, enabling the agent to incrementally master the deployment requirements. Inspired by self-paced learning techniques in unconstrained reinforcement learning (RL), our approach facilitates a smoother transition to challenging environments by initially training on simplified versions of the constraints and progressively introducing the full deployment conditions. We provide a theoretical analysis using an RL agent in a binary-tree Markov Decision Process (MDP) to demonstrate that our curriculum strategy can accelerate training relative to a baseline approach that imposes the trajectory constraints from the outset. Moreover, we empirically validate the effectiveness and generality of our method across both RL and large language model (LLM) agents in diverse settings, including a binary-tree MDP, a multi-task navigation domain, and a math reasoning task with two benchmarks. These results highlight the potential of curriculum design in enhancing the efficiency and performance of agents operating under complex trajectory constraints during deployment. Moreover, when applied to LLMs, our strategy enables compression of output chain-of-thought tokens, achieving a substantial inference speedup on consumer hardware, demonstrating its effectiveness for resource-constrained deployment.

تناقش هذه المقالة استراتيجية جديدة للتعلم المنهجي لتدريب الوكلاء تحت قيود صارمة، مما يسهل عليهم تلبية متطلبات النشر. من خلال تشديد هذه القيود تدريجياً، يمكن للوكلاء إتقان المهام المعقدة بشكل فعال، مما يظهر نهجاً واعداً لتعزيز أدائهم.

Este artículo discute una nueva estrategia de aprendizaje por currículum para entrenar agentes bajo estrictas limitaciones, facilitando su capacidad para cumplir con los requisitos de despliegue. Al endurecer gradualmente estas limitaciones, los agentes pueden dominar eficazmente tareas complejas, lo que demuestra un enfoque prometedor para mejorar su rendimiento.

Cet article présente une nouvelle stratégie d'apprentissage par curriculum pour former des agents sous des contraintes strictes, facilitant ainsi leur capacité à répondre aux exigences de déploiement. En resserrant progressivement ces contraintes, les agents peuvent maîtriser efficacement des tâches complexes, montrant une approche prometteuse pour améliorer leurs performances.

This article discusses a new curriculum learning strategy for training agents under strict constraints, making it easier for them to meet deployment requirements. By gradually tightening these constraints, agents can effectively master complex tasks, showcasing a promising approach to enhance their performance.

Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs

Was this article worth reading? Share it

Ready to build your own newsroom?