UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving

arXiv — cs.CV•Thursday, November 13, 2025 at 5:00:00 AM

UniMM-V2X represents a significant leap in the field of autonomous driving by addressing the limitations of existing systems that often operate in isolation. The framework's multi-level fusion strategy allows for enhanced cooperation among agents, improving their ability to perceive, predict, and plan collaboratively. By integrating a Mixture-of-Experts architecture, UniMM-V2X dynamically enhances representations, leading to notable performance gains. Experiments conducted on the DAIR-V2X dataset demonstrate its state-of-the-art capabilities, with improvements in perception accuracy by 39.7%, a 7.2% reduction in prediction error, and a 33.2% enhancement in planning performance. These advancements not only highlight the potential of cooperative autonomous driving but also pave the way for safer and more efficient transportation systems in the future.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

One More Thing in AI

Master AI with curated tools and tutorials for practical, real-world applications.

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

Metaflow AI

Unify AI discovery and execution in one intuitive workspace for scalable workflows.

Creative & DesignView app details

VUME

Automate your growth with AI agents that work autonomously for your business.

AI & DataView app details

Https

Access multiple AI models seamlessly in one unified chat application.

AI & DataView app details

Attentive AI

Extract digital maps from satellite, aerial, and drone imagery using deep learning.

AI & DataView app details

Continue Readings

arXiv — cs.CV2 days ago

SoC: Semantic Orthogonal Calibration for Test-Time Prompt Tuning

PositiveArtificial Intelligence

A new study introduces Semantic Orthogonal Calibration (SoC), a method aimed at improving the calibration of uncertainty estimates in vision-language models (VLMs) during test-time prompt tuning. This approach addresses the challenge of overconfidence in models by enforcing smooth prototype separation while maintaining semantic proximity.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Learning-based Multi-View Stereo: A Survey

NeutralArtificial Intelligence

A recent survey on learning-based Multi-View Stereo (MVS) techniques highlights the advancements in 3D reconstruction, which is crucial for applications such as Augmented and Virtual Reality, autonomous driving, and robotics. The study categorizes these methods into depth map-based, voxel-based, NeRF-based, and others, emphasizing the effectiveness of depth map-based approaches.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Simulating the Visual World with Artificial Intelligence: A Roadmap

NeutralArtificial Intelligence

The landscape of video generation is evolving, transitioning from merely creating visually appealing clips to constructing interactive virtual environments that adhere to physical plausibility. This shift is highlighted in a recent survey that conceptualizes modern video foundation models as a combination of implicit world models and video renderers, enabling coherent visual reasoning and task planning.

Read full article

via arXiv — cs.CV

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about