The 2025 Planning Performance of Frontier Large Language Models
PositiveArtificial Intelligence
The evaluation of frontier Large Language Models (LLMs) in 2025 reveals notable advancements in their planning capabilities, particularly for models like DeepSeek R1, Gemini 2.5 Pro, and GPT-5. Conducted using PDDL domain and task descriptions, the study found that GPT-5's performance in solving tasks is competitive with the established planner LAMA. However, when faced with obfuscated tasks designed to test pure reasoning, all models experienced a decline in performance, albeit less severely than earlier generations. This indicates that while there are improvements, challenges remain in reasoning tasks. The results underscore the ongoing evolution of LLMs, suggesting that they are becoming increasingly capable in complex planning scenarios, thus bridging the gap with traditional planning methods. As the field continues to advance, these findings are crucial for understanding the potential applications and limitations of LLMs in real-world problem-solving contexts.
— via World Pulse Now AI Editorial System
