DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification

arXiv — cs.CVFriday, November 7, 2025 at 5:00:00 AM
A new study introduces DINOv2 for video-based visible-infrared person re-identification, focusing on the importance of gait features in improving cross-modal video matching. This research is significant as it addresses the limitations of existing methods that often ignore the dynamic aspects of gait, which can enhance the accuracy of identifying individuals across different visual modalities.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles
PositiveArtificial Intelligence
A new dataset named MM-UAV has been introduced, designed for tracking unmanned aerial vehicles (UAVs) using a multi-modal approach that includes RGB, infrared, and event signals. This dataset features over 30 challenging scenarios with 1,321 synchronized sequences and more than 2.8 million annotated frames, addressing the limitations of single-modality tracking in difficult conditions.
MambaRefine-YOLO: A Dual-Modality Small Object Detector for UAV Imagery
PositiveArtificial Intelligence
MambaRefine-YOLO has been introduced as a dual-modality small object detector specifically designed for Unmanned Aerial Vehicle (UAV) imagery, addressing the challenges of low resolution and background clutter in small object detection. The model incorporates a Dual-Gated Complementary Mamba fusion module (DGC-MFM) and a Hierarchical Feature Aggregation Neck (HFAN), achieving a state-of-the-art mean Average Precision (mAP) of 83.2% on the DroneVehicle dataset.
DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams
PositiveArtificial Intelligence
The introduction of DeepCoT, or Deep Continual Transformers, represents a significant advancement in real-time inference on data streams, addressing the challenges of high computational costs and redundancy in existing models. This encoder-only model is designed to work with deep architectures while maintaining performance across audio, video, and text streams.