ChangeDINO: DINOv3-Driven Building Change Detection in Optical Remote Sensing Imagery

arXiv — cs.CV•Friday, November 21, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

ChangeDINO introduces an innovative approach to building change detection in optical remote sensing imagery, leveraging a multiscale Siamese framework and DINOv3 features to improve accuracy and robustness.
This advancement is significant as it addresses limitations in existing deep learning methods, particularly in handling illumination variations and sparse labels, thereby enhancing the reliability of change detection in various conditions.
The development aligns with ongoing efforts in the AI field to improve object tracking and environmental monitoring, as seen in related frameworks like SwiTrack and applications of DINOv3 in probabilistic rainfall forecasting.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Attentive AI

Extract digital maps from satellite, aerial, and drone imagery using deep learning.

AI & DataTry the app

Chroma

Unified AI data retrieval and search for developers.

Tech & Developer ToolsTry the app

Dubsmart LLC

Multilingual AI dubbing and voice cloning for global video content localization.

AI & DataTry the app

Continue Readings

arXiv — cs.CVa day ago

FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception

PositiveArtificial Intelligence

A new framework named FisheyeGaussianLift has been introduced, which enhances BEV (Bird's Eye View) semantic segmentation from fisheye camera imagery. This method addresses challenges such as non-linear distortion and occlusion by utilizing calibrated geometric unprojection and depth distribution estimation, achieving significant segmentation performance in complex environments.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models

PositiveArtificial Intelligence

A new framework called MMT-ARD has been proposed to enhance the robustness of Vision-Language Models (VLMs) through a Multimodal Multi-Teacher Adversarial Distillation approach. This method addresses the limitations of traditional single-teacher distillation by incorporating a dual-teacher knowledge fusion architecture, which optimizes both clean feature preservation and robust feature enhancement.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

QuantFace: Efficient Quantization for Face Restoration

PositiveArtificial Intelligence

A novel low-bit quantization framework named QuantFace has been introduced to enhance face restoration models, which have been limited by heavy computational demands. This framework quantizes full-precision weights and activations from 32-bit to 4-6-bit, employing techniques like rotation-scaling channel balancing and Quantization-Distillation Low-Rank Adaptation (QD-LoRA) to optimize performance.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Draft and Refine with Visual Experts

PositiveArtificial Intelligence

Recent advancements in Large Vision-Language Models (LVLMs) have led to the introduction of the Draft and Refine (DnR) framework, which enhances the models' reasoning capabilities by quantifying their reliance on visual evidence through a question-conditioned utilization metric. This approach aims to reduce ungrounded or hallucinated responses by refining initial drafts with targeted feedback from visual experts.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

Comprehensive Evaluation of Prototype Neural Networks

NeutralArtificial Intelligence

A comprehensive evaluation of prototype neural networks has been conducted, focusing on models such as ProtoPNet, ProtoPool, and PIPNet. The study applies a variety of metrics, including new ones proposed by the authors, to assess model interpretability across diverse datasets, including fine-grained and multi-label classification tasks. The code for these evaluations is available as an open-source library on GitHub.

Read full article

via arXiv — cs.LG

arXiv — cs.CLa day ago

Reproducibility Report: Test-Time Training on Nearest Neighbors for Large Language Models

PositiveArtificial Intelligence

A recent reproducibility report confirms the effectiveness of Test-Time Training on Nearest Neighbors for Large Language Models, demonstrating that fine-tuning language models like GPT-2 and GPT-Neo during inference can significantly reduce perplexity across various datasets, particularly in specialized domains such as GitHub and EuroParl.

Read full article

via arXiv — cs.CL

arXiv — cs.CVa day ago

Lost in Translation and Noise: A Deep Dive into the Failure Modes of VLMs on Real-World Tables

NeutralArtificial Intelligence

The introduction of MirageTVQA, a new benchmark for evaluating Vision-Language Models (VLMs), highlights the significant performance gaps in existing datasets that primarily focus on monolingual and visually perfect tables. This benchmark includes nearly 60,000 QA pairs across 24 languages and incorporates realistic noise to better reflect real-world scenarios.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

MuM: Multi-View Masked Image Modeling for 3D Vision

PositiveArtificial Intelligence

The recent paper titled 'MuM: Multi-View Masked Image Modeling for 3D Vision' introduces a novel approach to self-supervised learning, focusing on extracting visual representations from unlabeled data specifically for 3D understanding. The proposed model, MuM, builds on the concept of masked autoencoding and extends it to multiple views of the same scene, aiming for simplicity and scalability compared to previous methods like CroCo.

Read full article

via arXiv — cs.LG