DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry

arXiv — cs.LGWednesday, November 12, 2025 at 5:00:00 AM
The introduction of DynaSolidGeo marks a significant advancement in the evaluation of spatial reasoning capabilities in Vision-Language Models (VLMs). Traditional benchmarks have largely been limited to static datasets and 2D plane geometry, which can lead to data contamination and a focus on final answers rather than the reasoning process. DynaSolidGeo addresses these shortcomings by employing a semi-automatic annotation pipeline to create 503 expert-curated seed questions, enabling the dynamic generation of a vast array of multimodal text-visual instances. This benchmark not only assesses answer accuracy but also incorporates process evaluation through expert-annotated reasoning chains, measuring logical validity and causal coherence. Initial experiments indicate that VLMs exhibit large performance gaps and significant degradation in dynamic settings, particularly on tasks that demand high-level spatial intelligence. This highlights the necessity for improved benchmarks that reflect …
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it