arXiv:2511.02777v1 Announce Type: new 
Abstract: We present PercHead, a method for single-image 3D head reconstruction and semantic 3D editing - two tasks that are inherently challenging due to severe view occlusions, weak perceptual supervision, and the ambiguity of editing in 3D space. We develop a unified base model for reconstructing view-consistent 3D heads from a single input image. The model employs a dual-branch encoder followed by a ViT-based decoder that lifts 2D features into 3D space through iterative cross-attention. Rendering is performed using Gaussian Splatting. At the heart of our approach is a novel perceptual supervision strategy based on DINOv2 and SAM2.1, which provides rich, generalized signals for both geometric and appearance fidelity. Our model achieves state-of-the-art performance in novel-view synthesis and, furthermore, exhibits exceptional robustness to extreme viewing angles compared to established baselines. Furthermore, this base model can be seamlessly extended for semantic 3D editing by swapping the encoder and finetuning the network. In this variant, we disentangle geometry and style through two distinct input modalities: a segmentation map to control geometry and either a text prompt or a reference image to specify appearance. We highlight the intuitive and powerful 3D editing capabilities of our model through a lightweight, interactive GUI, where users can effortlessly sculpt geometry by drawing segmentation maps and stylize appearance via natural language or image prompts.
  Project Page: https://antoniooroz.github.io/PercHead Video: https://www.youtube.com/watch?v=4hFybgTk4kE

PercHead هي طريقة مبتكرة لإعادة بناء وتحرير الرؤوس ثلاثية الأبعاد من صورة واحدة، تتناول تحديات مثل انسدادات الرؤية والإشراف الإدراكي. يستخدم هذا النموذج الموحد مشفرًا مزدوج الفرع ومفكك تشفير قائم على ViT لتحويل الميزات ثنائية الأبعاد إلى فضاء ثلاثي الأبعاد، مما يحقق تقدمًا كبيرًا في هذا المجال.

PercHead es un método innovador para la reconstrucción y edición de cabezas 3D a partir de una sola imagen, enfrentando desafíos como las oclusiones de vista y la supervisión perceptual. Este modelo unificado utiliza un codificador de doble rama y un decodificador basado en ViT para transformar características 2D en espacio 3D, logrando avances significativos en el campo.

PercHead est une méthode innovante pour la reconstruction et l'édition de têtes 3D à partir d'une seule image, abordant des défis tels que les occlusions de vue et la supervision perceptuelle. Ce modèle unifié utilise un encodeur à double branche et un décodeur basé sur ViT pour transformer des caractéristiques 2D en espace 3D, réalisant des avancées significatives dans le domaine.

PercHead is an innovative method for reconstructing and editing 3D heads from a single image, tackling challenges like view occlusions and perceptual supervision. This unified model uses a dual-branch encoder and a ViT-based decoder to transform 2D features into 3D space, making significant strides in the field.

PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

Was this article worth reading? Share it

Ready to build your own newsroom?