arXiv:2511.08294v2 Announce Type: replace 
Abstract: Accurate 3D human pose estimation is fundamental for applications such as augmented reality and human-robot interaction. State-of-the-art multi-view methods learn to fuse predictions across views by training on large annotated datasets, leading to poor generalization when the test scenario differs. To overcome these limitations, we propose SkelSplat, a novel framework for multi-view 3D human pose estimation based on differentiable Gaussian rendering. Human pose is modeled as a skeleton of 3D Gaussians, one per joint, optimized via differentiable rendering to enable seamless fusion of arbitrary camera views without 3D ground-truth supervision. Since Gaussian Splatting was originally designed for dense scene reconstruction, we propose a novel one-hot encoding scheme that enables independent optimization of human joints. SkelSplat outperforms approaches that do not rely on 3D ground truth in Human3.6M and CMU, while reducing the cross-dataset error up to 47.8% compared to learning-based methods. Experiments on Human3.6M-Occ and Occlusion-Person demonstrate robustness to occlusions, without scenario-specific fine-tuning. Our project page is available here: https://skelsplat.github.io.

تم تقديم SkelSplat كإطار جديد لتقدير وضعية الإنسان ثلاثية الأبعاد من وجهات نظر متعددة، باستخدام تقنية العرض الغاوسي القابل للاشتقاق لتحسين الدقة دون الاعتماد على إشراف الحقيقة الثلاثية. هذه الطريقة تقوم بنمذجة وضعية الإنسان باستخدام هيكل عظمي من الغاوسيات ثلاثية الأبعاد المحسّنة للاندماج السلس عبر وجهات نظر كاميرا متنوعة.

SkelSplat se ha presentado como un nuevo marco para la estimación de pose humana 3D desde múltiples vistas, utilizando renderizado gaussiano diferenciable para mejorar la precisión sin depender de la supervisión de verdad de terreno 3D. Este método modela la pose humana utilizando un esqueleto de gaussianas 3D optimizadas para una fusión fluida a través de diversas vistas de cámara.

SkelSplat a été présenté comme un cadre novateur pour l'estimation de pose humaine 3D multi-vues, utilisant le rendu gaussien différentiable pour améliorer la précision sans dépendre de la supervision par vérité de terrain 3D. Cette méthode modélise la pose humaine à l'aide d'un squelette de gaussiennes 3D optimisées pour une fusion transparente à travers diverses vues de caméra.

SkelSplat has been introduced as a novel framework for multi-view 3D human pose estimation, utilizing differentiable Gaussian rendering to enhance accuracy without relying on 3D ground-truth supervision. This method models human pose using a skeleton of 3D Gaussians optimized for seamless fusion across various camera views.

SkelSplat: Robust Multi-view 3D Human Pose Estimation with Differentiable Gaussian Rendering

Was this article worth reading? Share it

SwapAnything.io

GPTHumanizer

Uwear

Fakeface

X Headshot

Deptho.ai

Ready to build your own newsroom?