DGL-RSIS: Decoupling Global Spatial Context and Local Class Semantics for Training-Free Remote Sensing Image Segmentation
NeutralArtificial Intelligence
The emergence of vision language models (VLMs) has significantly advanced multimodal understanding, yet their application in remote sensing image segmentation faces challenges due to the domain gap and diverse inputs. The DGL-RSIS framework proposes a solution by decoupling visual and textual representations, utilizing a Global-Local Decoupling (GLD) module to break down textual inputs into local and global tokens. This is complemented by a Local Visual-Textual Alignment (LVTA) module that extracts context-aware visual features, achieving effective open-vocabulary semantic segmentation. Furthermore, the Global Visual-Textual Alignment (GVTA) module enhances contextual understanding for referring expressions. This innovative approach not only bridges the gap between vision and language but also sets a precedent for future developments in remote sensing applications.
— via World Pulse Now AI Editorial System
