arXiv:2511.02564v1 Announce Type: new 
Abstract: Video-based person re-identification (ReID) in cross-view domains (for example, aerial-ground surveillance) remains an open problem because of extreme viewpoint shifts, scale disparities, and temporal inconsistencies. To address these challenges, we propose MTF-CVReID, a parameter-efficient framework that introduces seven complementary modules over a ViT-B/16 backbone. Specifically, we include: (1) Cross-Stream Feature Normalization (CSFN) to correct camera and view biases; (2) Multi-Resolution Feature Harmonization (MRFH) for scale stabilization across altitudes; (3) Identity-Aware Memory Module (IAMM) to reinforce persistent identity traits; (4) Temporal Dynamics Modeling (TDM) for motion-aware short-term temporal encoding; (5) Inter-View Feature Alignment (IVFA) for perspective-invariant representation alignment; (6) Hierarchical Temporal Pattern Learning (HTPL) to capture multi-scale temporal regularities; and (7) Multi-View Identity Consistency Learning (MVICL) that enforces cross-view identity coherence using a contrastive learning paradigm. Despite adding only about 2 million parameters and 0.7 GFLOPs over the baseline, MTF-CVReID maintains real-time efficiency (189 FPS) and achieves state-of-the-art performance on the AG-VPReID benchmark across all altitude levels, with strong cross-dataset generalization to G2A-VReID and MARS datasets. These results show that carefully designed adapter-based modules can substantially enhance cross-view robustness and temporal consistency without compromising computational efficiency. The source code is available at https://github.com/MdRashidunnabi/MTF-CVReID

تم تقديم إطار جديد يسمى MTF-CVReID لمعالجة تحديات إعادة تحديد الهوية للأشخاص في مقاطع الفيديو عبر وجهات نظر مختلفة. تستخدم هذه الطريقة المبتكرة هيكل ViT-B/16 وتدمج سبعة وحدات، بما في ذلك تطبيع الميزات عبر التدفقات، لتعزيز الأداء على الرغم من تغيرات الزاوية واختلافات المقياس.

Se ha introducido un nuevo marco llamado MTF-CVReID para abordar los desafíos de la re-identificación de personas en videos a través de diferentes vistas. Este enfoque innovador utiliza una base ViT-B/16 e incorpora siete módulos, incluida la Normalización de Características Inter-Flux, para mejorar el rendimiento a pesar de los cambios de perspectiva y las disparidades de escala.

Un nouveau cadre appelé MTF-CVReID a été introduit pour relever les défis de la ré-identification de personnes dans des vidéos à travers différentes vues. Cette approche innovante utilise un backbone ViT-B/16 et intègre sept modules, dont la Normalisation des Caractéristiques Inter-Flux, pour améliorer les performances malgré les changements de point de vue et les disparités d'échelle.

A new framework called MTF-CVReID has been introduced to tackle the challenges of video-based person re-identification across different views. This innovative approach uses a ViT-B/16 backbone and incorporates seven modules, including Cross-Stream Feature Normalization, to enhance performance despite viewpoint shifts and scale disparities.

Seeing Across Time and Views: Multi-Temporal Cross-View Learning for Robust Video Person Re-Identification

Was this article worth reading? Share it

Ready to build your own newsroom?