World PulseNowPowered by AI

Trending:

One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control

arXiv — cs.CV•Tuesday, November 25, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

One4D has been introduced as a unified framework for 4D generation and reconstruction, capable of producing dynamic 4D content through synchronized RGB frames and pointmaps. This framework utilizes a Unified Masked Conditioning mechanism to handle varying sparsities of conditioning frames, allowing for seamless transitions between 4D generation from a single image and reconstruction from full videos or sparse frames.
The introduction of One4D is significant as it addresses challenges in joint RGB and pointmap generation, particularly the limitations of existing diffusion finetuning strategies. By implementing Decoupled LoRA Control, One4D enhances the capabilities of video generation models, potentially leading to more realistic and versatile 4D content creation.
This development reflects a broader trend in AI and video generation technologies, where advancements such as object-aware motion generation and controllable scene generation are becoming increasingly prominent. The integration of various modalities and the focus on overcoming limitations in existing models highlight the ongoing evolution in the field, aiming for more sophisticated and realistic outputs in video and image generation.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

Fakeface

Swap faces instantly with advanced AI technology for realistic results.

Tech & Developer ToolsTry the app

Chat4o.ai

Generate AI images and get instant assistance with Chat4O's intelligent tools.

AI & DataTry the app

sync. labs

Create, reanimate, and understand humans in video with advanced lip-sync technology.

Creative & DesignTry the app

Continue Readings

SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting

arXiv — cs.CVa day ago

SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting

PositiveArtificial Intelligence

A new framework for articulated object reconstruction has been proposed, utilizing planar Gaussian Splatting to reconstruct 3D objects from sparse-view RGB images captured from a single state. This method introduces a Gaussian information field to optimize viewpoint selection and employs a coarse-to-fine optimization strategy for depth estimation and part segmentation.

Read full article

via arXiv — cs.CV

A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles

arXiv — cs.CVa day ago

A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles

PositiveArtificial Intelligence

A new dataset named MM-UAV has been introduced, designed for tracking unmanned aerial vehicles (UAVs) using a multi-modal approach that includes RGB, infrared, and event signals. This dataset features over 30 challenging scenarios with 1,321 synchronized sequences and more than 2.8 million annotated frames, addressing the limitations of single-modality tracking in difficult conditions.

Read full article

via arXiv — cs.CV

MambaRefine-YOLO: A Dual-Modality Small Object Detector for UAV Imagery

arXiv — cs.CVa day ago

MambaRefine-YOLO: A Dual-Modality Small Object Detector for UAV Imagery

PositiveArtificial Intelligence

MambaRefine-YOLO has been introduced as a dual-modality small object detector specifically designed for Unmanned Aerial Vehicle (UAV) imagery, addressing the challenges of low resolution and background clutter in small object detection. The model incorporates a Dual-Gated Complementary Mamba fusion module (DGC-MFM) and a Hierarchical Feature Aggregation Neck (HFAN), achieving a state-of-the-art mean Average Precision (mAP) of 83.2% on the DroneVehicle dataset.

Read full article

via arXiv — cs.CV

Show Me: Unifying Instructional Image and Video Generation with Diffusion Models

arXiv — cs.CVa day ago

Show Me: Unifying Instructional Image and Video Generation with Diffusion Models

PositiveArtificial Intelligence

The recent introduction of ShowMe, a unified framework for instructional image and video generation, addresses the limitations of previous methods that treated image manipulation and video prediction as separate tasks. By activating spatial and temporal components of video diffusion models, ShowMe enhances the generation of visual instructions in interactive world simulators.

Read full article

via arXiv — cs.CV

A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification

arXiv — cs.CVa day ago

A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification

PositiveArtificial Intelligence

A new framework called KTCAA has been introduced for few-shot cross-modal sketch person re-identification, aiming to bridge the gap between hand-drawn sketches and RGB surveillance images. This framework addresses challenges related to domain discrepancy and perturbation invariance, proposing innovative components like Alignment Augmentation and Knowledge Transfer Catalyst to enhance model robustness and alignment capabilities.

Read full article

via arXiv — cs.CV

Are Image-to-Video Models Good Zero-Shot Image Editors?

arXiv — cs.CVa day ago

Are Image-to-Video Models Good Zero-Shot Image Editors?

PositiveArtificial Intelligence

A new framework called IF-Edit has been introduced, leveraging large-scale video diffusion models for zero-shot image editing. This method addresses challenges such as prompt misalignment and blurry late-stage frames, enhancing the capabilities of pretrained models for instruction-driven image editing.

Read full article

via arXiv — cs.CV

Roadside Monocular 3D Detection Prompted by 2D Detection

arXiv — cs.CVa day ago

Roadside Monocular 3D Detection Prompted by 2D Detection

PositiveArtificial Intelligence

The introduction of the Promptable 3D Detector (Pro3D) marks a significant advancement in roadside monocular 3D detection, which involves identifying objects in RGB frames and predicting their 3D attributes, such as bird's-eye-view locations. This innovation leverages 2D detections as prompts to enhance the accuracy and efficiency of 3D detection processes.

Read full article

via arXiv — cs.CV

LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation

arXiv — cs.CV2 days ago

LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation

PositiveArtificial Intelligence

LinVideo has been introduced as a post-training framework that enhances video generation efficiency by replacing certain self-attention modules with linear attention, addressing the quadratic computational costs associated with traditional video diffusion models. This method preserves the original model's performance while significantly reducing resource demands.

Read full article

via arXiv — cs.CV