The 2025 Planning Performance of Frontier Large Language Models

arXiv — cs.LGThursday, November 13, 2025 at 5:00:00 AM
The evaluation of frontier Large Language Models (LLMs) in 2025 reveals notable advancements in their planning capabilities, particularly for models like DeepSeek R1, Gemini 2.5 Pro, and GPT-5. Conducted using PDDL domain and task descriptions, the study found that GPT-5's performance in solving tasks is competitive with the established planner LAMA. However, when faced with obfuscated tasks designed to test pure reasoning, all models experienced a decline in performance, albeit less severely than earlier generations. This indicates that while there are improvements, challenges remain in reasoning tasks. The results underscore the ongoing evolution of LLMs, suggesting that they are becoming increasingly capable in complex planning scenarios, thus bridging the gap with traditional planning methods. As the field continues to advance, these findings are crucial for understanding the potential applications and limitations of LLMs in real-world problem-solving contexts.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about