Text-based Aerial-Ground Person Retrieval

arXiv — cs.CVWednesday, November 12, 2025 at 5:00:00 AM
The recent development of Text-based Aerial-Ground Person Retrieval (TAG-PR) represents a notable step forward in the field of image retrieval, particularly in addressing the complexities of retrieving person images from disparate aerial and ground perspectives. This innovation is underscored by the introduction of the TAG-PEDES dataset, which is constructed from public benchmarks and features automatically generated textual descriptions, ensuring robustness against view heterogeneity. Complementing this dataset is the TAG-CLIP retrieval framework, designed to effectively manage the challenges of viewpoint discrepancies through a mixture of experts module that learns both view-specific and view-agnostic features. The effectiveness of TAG-CLIP has been evaluated on the TAG-PEDES dataset as well as existing benchmarks, demonstrating its potential for practical applications. Both the dataset and the code are accessible on GitHub, facilitating further research and development in this area.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about