TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving

arXiv — cs.CV•Thursday, November 13, 2025 at 5:00:00 AM

The introduction of TopoStreamer marks a significant advancement in the field of autonomous driving, specifically in lane segment topology reasoning. By addressing the limitations of existing methods, TopoStreamer enhances the accuracy of road network reconstruction, which is vital for safe and efficient autonomous maneuvers such as turning and lane changing. The model incorporates innovative features like streaming attribute constraints to maintain temporal consistency, dynamic lane boundary positional encoding for real-time updates, and lane segment denoising to better capture lane patterns. These improvements were assessed using the OpenLane-V2 dataset, demonstrating TopoStreamer's superior performance compared to state-of-the-art methods. This development not only contributes to the ongoing evolution of autonomous driving technology but also underscores the importance of precise lane segment understanding in ensuring the safety and effectiveness of self-driving vehicles.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CV7 hours ago

Enhancing End-to-End Autonomous Driving with Risk Semantic Distillaion from VLM

PositiveArtificial Intelligence

The paper introduces Risk Semantic Distillation (RSD), a novel framework aimed at enhancing end-to-end autonomous driving (AD) systems. While current AD systems perform well in complex scenarios, they struggle with generalization to unseen situations. RSD leverages Vision-Language Models (VLMs) to improve training efficiency and consistency in trajectory planning, addressing challenges posed by hybrid AD systems that utilize multiple planning approaches. This advancement is crucial for the future of autonomous driving technology.

Read full article

via arXiv — cs.CV

arXiv — cs.LG7 hours ago

MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding

PositiveArtificial Intelligence

MMEdge is a proposed framework designed to enhance real-time multimodal inference on resource-constrained edge devices, crucial for applications like autonomous driving and mobile health. It addresses the challenges of sensing dynamics and inter-modality dependencies by breaking down the inference process into fine-grained sensing and encoding units. This allows for incremental computation as data is received, while a lightweight temporal aggregation module ensures accuracy by capturing rich temporal dynamics across different units.

Read full article

via arXiv — cs.LG

arXiv — cs.CV7 hours ago

STONE: Pioneering the One-to-N Backdoor Threat in 3D Point Cloud

PositiveArtificial Intelligence

Backdoor attacks represent a significant risk to deep learning, particularly in critical 3D applications like autonomous driving and robotics. Current methods primarily focus on static one-to-one attacks, leaving the more versatile one-to-N backdoor threat largely unaddressed. The introduction of STONE (Spherical Trigger One-to-N Backdoor Enabling) marks a pivotal advancement, offering a configurable spherical trigger that can manipulate multiple output labels while maintaining high accuracy in clean data.

Read full article

via arXiv — cs.CV

arXiv — cs.CV7 hours ago

Cheating Stereo Matching in Full-scale: Physical Adversarial Attack against Binocular Depth Estimation in Autonomous Driving

NeutralArtificial Intelligence

A recent study has introduced a novel physical adversarial attack targeting stereo matching models used in autonomous driving. Unlike traditional attacks that utilize 2D patches, this method employs a 3D physical adversarial example (PAE) with global camouflage texture, enhancing visual consistency across various viewpoints of stereo cameras. The research also presents a new 3D stereo matching rendering module to align the PAE with real-world positions, addressing the disparity effects inherent in binocular vision.

Read full article

via arXiv — cs.CV

arXiv — cs.CV7 hours ago

VLMs Guided Interpretable Decision Making for Autonomous Driving

PositiveArtificial Intelligence

Recent advancements in autonomous driving have investigated the application of vision-language models (VLMs) in visual question answering (VQA) frameworks for driving decision-making. However, these methods often rely on handcrafted prompts and exhibit inconsistent performance, which hampers their effectiveness in real-world scenarios. This study assesses state-of-the-art open-source VLMs on high-level decision-making tasks using ego-view visual inputs, revealing significant limitations in their ability to provide reliable, context-aware decisions.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

Understanding World or Predicting Future? A Comprehensive Survey of World Models

NeutralArtificial Intelligence

The article discusses the growing interest in world models, particularly in the context of advancements in multimodal large language models like GPT-4 and video generation models such as Sora. It provides a comprehensive review of the literature on world models, which serve to either understand the current state of the world or predict future dynamics. The review categorizes world models based on their functions: constructing internal representations and predicting future states, with applications in generative games, autonomous driving, robotics, and social simulacra.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection

PositiveArtificial Intelligence

The paper titled 'FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection' addresses the challenges of deploying PETR models in autonomous driving due to their high computational costs and memory requirements. It introduces FQ-PETR, a fully quantized framework that aims to enhance efficiency without sacrificing accuracy. Key innovations include a Quantization-Friendly LiDAR-ray Position Embedding and techniques to mitigate accuracy degradation typically associated with quantization methods.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios

PositiveArtificial Intelligence

The CATS-V2V dataset introduces a pioneering real-world collection for Vehicle-to-Vehicle (V2V) cooperative perception, aimed at enhancing autonomous driving in complex adverse traffic scenarios. Collected using two time-synchronized vehicles, the dataset encompasses 100 clips featuring 60,000 frames of LiDAR point clouds and 1.26 million multi-view camera images across various weather and lighting conditions. This dataset is expected to significantly benefit the autonomous driving community by providing high-quality data for improved perception capabilities.

Read full article

via arXiv — cs.CV