arXiv:2511.07819v1 Announce Type: new 
Abstract: Human motion synthesis in 3D scenes relies heavily on scene comprehension, while current methods focus mainly on scene structure but ignore the semantic understanding. In this paper, we propose a human motion synthesis framework that take an unified Scene Semantic Occupancy (SSO) for scene representation, termed SSOMotion. We design a bi-directional tri-plane decomposition to derive a compact version of the SSO, and scene semantics are mapped to an unified feature space via CLIP encoding and shared linear dimensionality reduction. Such strategy can derive the fine-grained scene semantic structures while significantly reduce redundant computations. We further take these scene hints and movement direction derived from instructions for motion control via frame-wise scene query. Extensive experiments and ablation studies conducted on cluttered scenes using ShapeNet furniture, as well as scanned scenes from PROX and Replica datasets, demonstrate its cutting-edge performance while validating its effectiveness and generalization ability. Code will be publicly available at https://github.com/jingyugong/SSOMotion.

تم اقتراح إطار عمل جديد لتوليد الحركة البشرية في مشاهد ثلاثية الأبعاد، يسمى SSOMotion، والذي يركز على الفهم الدلالي إلى جانب هيكل المشهد. تستخدم هذه الطريقة تمثيلًا موحدًا للاحتلال الدلالي للمشهد (SSO) وقد أظهرت أداءً متقدمًا في التجارب باستخدام مجموعات بيانات مثل ShapeNet وPROX وReplica. سيكون الكود متاحًا للجمهور، مما يعزز الوصول للبحث المستقبلي في هذا المجال.

Se ha propuesto un nuevo marco para la síntesis de movimiento humano en escenas 3D, llamado SSOMotion, que enfatiza la comprensión semántica junto con la estructura de la escena. Este enfoque utiliza una representación unificada de ocupación semántica de la escena (SSO) y ha mostrado un rendimiento de vanguardia en experimentos utilizando conjuntos de datos como ShapeNet, PROX y Replica. El código estará disponible públicamente, mejorando el acceso para futuras investigaciones en esta área.

Un nouveau cadre pour la synthèse de mouvements humains dans des scènes 3D, nommé SSOMotion, a été proposé, mettant l'accent sur la compréhension sémantique en plus de la structure de la scène. Cette approche utilise une représentation unifiée de l'occupation sémantique de la scène (SSO) et a montré des performances de pointe lors d'expériences utilisant des ensembles de données tels que ShapeNet, PROX et Replica. Le code sera disponible publiquement, améliorant l'accessibilité pour de futures recherches dans ce domaine.

A new framework for human motion synthesis in 3D scenes, named SSOMotion, has been proposed, emphasizing semantic understanding alongside scene structure. This approach utilizes a unified Scene Semantic Occupancy (SSO) representation and has shown cutting-edge performance in experiments using datasets like ShapeNet, PROX, and Replica. The code will be publicly available, enhancing accessibility for further research in this area.

Human Motion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy

Was this article worth reading? Share it

Ready to build your own newsroom?