The PAVE dataset represents a significant advancement in the evaluation of autonomous vehicles (AVs), being the first end-to-end benchmark dataset collected entirely through autonomous driving in real-world conditions. It includes over 100 hours of naturalistic data from various production AV models, segmented into 32,727 key frames with synchronized camera images and high-precision GNSS/IMU data. This dataset aims to enhance the understanding of AV behavior and safety, providing crucial insights for future developments in autonomous driving technology.
Accurate monocular depth estimation is essential for understanding 3D scenes, yet current methods often produce blurred depth at object boundaries, leading to erroneous 3D points. This study introduces a self-supervised approach that models per-pixel depth as a mixture distribution, allowing for sharp depth discontinuities without fine-grained supervision. The method integrates variance-aware loss functions and uncertainty propagation, achieving up to 35% higher boundary sharpness and improved point cloud quality on KITTI and VKITTIv2 datasets.
CAR-Scenes is a frame-level dataset designed for autonomous driving, facilitating the training and evaluation of vision-language models (VLMs) for scene-level understanding. The dataset comprises 5,192 annotated images from sources like Argoverse, Cityscapes, KITTI, and nuScenes, utilizing a comprehensive 28-key category/sub-category knowledge base. The annotations are generated through a GPT-4o-assisted pipeline with human verification, providing detailed attributes and supporting semantic retrieval and risk-aware scenario mining.