World PulseNowPowered by AI

Trending:

WCCNet: Wavelet-context Cooperative Network for Efficient Multispectral Pedestrian Detection

arXiv — cs.CV•Monday, October 27, 2025 at 4:00:00 AM

PositiveArtificial Intelligence

The recent introduction of WCCNet, a wavelet-context cooperative network, marks a significant advancement in multispectral pedestrian detection, which is crucial for enhancing visibility in challenging environments. This innovation is particularly important for autonomous driving, where both accuracy and computational efficiency are vital. Unlike traditional methods that treat RGB and infrared data equally, WCCNet recognizes the unique characteristics of each modality, promising improved performance in real-world applications. This development could lead to safer and more reliable autonomous vehicles, making it a noteworthy milestone in the field.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

SnapChip

Find and source electronic components faster with AI-powered assistance.

AI & DataTry the app

Blunge

Train your own private AI image models to protect and personalize your unique artistic style.

Creative & DesignTry the app

Attentive AI

Extract digital maps from satellite, aerial, and drone imagery using deep learning.

AI & DataTry the app

Continue Readings

SupLID: Geometrical Guidance for Out-of-Distribution Detection in Semantic Segmentation

arXiv — cs.CVa day ago

SupLID: Geometrical Guidance for Out-of-Distribution Detection in Semantic Segmentation

PositiveArtificial Intelligence

A novel framework named SupLID has been introduced to enhance Out-of-Distribution (OOD) detection in semantic segmentation, focusing on pixel-level anomaly localization. This advancement moves beyond traditional image-level techniques, utilizing Linear Intrinsic Dimensionality (LID) to guide classifier-derived OOD scores effectively.

Read full article

via arXiv — cs.CV

MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images

arXiv — cs.CVa day ago

MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images

PositiveArtificial Intelligence

MonoSR has been introduced as a large-scale monocular spatial reasoning dataset, addressing the need for effective spatial reasoning from 2D images across various environments, including indoor, outdoor, and object-centric scenarios. This dataset supports multiple question types, paving the way for advancements in embodied AI and autonomous driving applications.

Read full article

via arXiv — cs.CV

A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles

arXiv — cs.CVa day ago

A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles

PositiveArtificial Intelligence

A new dataset named MM-UAV has been introduced, designed for tracking unmanned aerial vehicles (UAVs) using a multi-modal approach that includes RGB, infrared, and event signals. This dataset features over 30 challenging scenarios with 1,321 synchronized sequences and more than 2.8 million annotated frames, addressing the limitations of single-modality tracking in difficult conditions.

Read full article

via arXiv — cs.CV

MambaRefine-YOLO: A Dual-Modality Small Object Detector for UAV Imagery

arXiv — cs.CVa day ago

MambaRefine-YOLO: A Dual-Modality Small Object Detector for UAV Imagery

PositiveArtificial Intelligence

MambaRefine-YOLO has been introduced as a dual-modality small object detector specifically designed for Unmanned Aerial Vehicle (UAV) imagery, addressing the challenges of low resolution and background clutter in small object detection. The model incorporates a Dual-Gated Complementary Mamba fusion module (DGC-MFM) and a Hierarchical Feature Aggregation Neck (HFAN), achieving a state-of-the-art mean Average Precision (mAP) of 83.2% on the DroneVehicle dataset.

Read full article

via arXiv — cs.CV

Unified Low-Light Traffic Image Enhancement via Multi-Stage Illumination Recovery and Adaptive Noise Suppression

arXiv — cs.CVa day ago

Unified Low-Light Traffic Image Enhancement via Multi-Stage Illumination Recovery and Adaptive Noise Suppression

PositiveArtificial Intelligence

A new study presents a fully unsupervised multi-stage deep learning framework aimed at enhancing low-light traffic images, addressing challenges such as poor visibility, noise, and motion blur that affect autonomous driving and urban surveillance. The model employs three specialized modules: Illumination Adaptation, Reflectance Restoration, and Over-Exposure Compensation to improve image quality.

Read full article

via arXiv — cs.CV

CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation

arXiv — cs.CV2 days ago

CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation

PositiveArtificial Intelligence

The introduction of CleverDistiller marks a significant advancement in self-supervised cross-modal knowledge distillation, enabling the transfer of features from 2D vision foundation models to 3D LiDAR-based models. This framework utilizes a direct feature similarity loss and a multi-layer perceptron projection head, enhancing the learning of complex semantic dependencies in autonomous driving applications.

Read full article

via arXiv — cs.CV

Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes

arXiv — cs.CV2 days ago

Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes

PositiveArtificial Intelligence

A new method called Text2Traffic has been introduced for generating and editing images of traffic scenes, addressing challenges in intelligent transportation systems. This unified framework enhances the semantic richness and visual fidelity of generated images, which is crucial for applications like traffic monitoring and autonomous driving.

Read full article

via arXiv — cs.CV

QueryOcc: Query-based Self-Supervision for 3D Semantic Occupancy

arXiv — cs.CV2 days ago

QueryOcc: Query-based Self-Supervision for 3D Semantic Occupancy

PositiveArtificial Intelligence

QueryOcc has been introduced as a query-based self-supervised framework that learns continuous 3D semantic occupancy directly from sensor data, addressing the challenges of 3D scene geometry and semantics in computer vision, particularly for autonomous driving applications.

Read full article

via arXiv — cs.CV