arXiv:2511.21192v1 Announce Type: new 
Abstract: Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet universal and transferable attacks remain underexplored, as most existing patches overfit to a single model and fail in black-box settings. To address this gap, we present a systematic study of universal, transferable adversarial patches against VLA-driven robots under unknown architectures, finetuned variants, and sim-to-real shifts. We introduce UPA-RFAS (Universal Patch Attack via Robust Feature, Attention, and Semantics), a unified framework that learns a single physical patch in a shared feature space while promoting cross-model transfer. UPA-RFAS combines (i) a feature-space objective with an $\ell_1$ deviation prior and repulsive InfoNCE loss to induce transferable representation shifts, (ii) a robustness-augmented two-phase min-max procedure where an inner loop learns invisible sample-wise perturbations and an outer loop optimizes the universal patch against this hardened neighborhood, and (iii) two VLA-specific losses: Patch Attention Dominance to hijack text$\to$vision attention and Patch Semantic Misalignment to induce image-text mismatch without labels. Experiments across diverse VLA models, manipulation suites, and physical executions show that UPA-RFAS consistently transfers across models, tasks, and viewpoints, exposing a practical patch-based attack surface and establishing a strong baseline for future defenses.

تم إجراء دراسة منهجية حول الرقع العدائية العالمية والقابلة للنقل التي تستهدف نماذج اللغة-الرؤية-العمل (VLA)، مما يكشف عن ضعفها أمام الهجمات. الإطار المقدم UPA-RFAS يهدف إلى إنشاء رقعة مادية واحدة يمكن نقلها بفعالية بين نماذج مختلفة، مما يعالج قيود الأساليب الحالية التي غالبًا ما تتكيف مع هياكل معينة.

Se ha realizado un estudio sistemático sobre parches adversariales universales y transferibles dirigidos a modelos de Acción-Lenguaje-Visión (VLA), revelando su vulnerabilidad a ataques. El marco UPA-RFAS introducido tiene como objetivo crear un solo parche físico que pueda transferirse de manera efectiva entre diferentes modelos, abordando así las limitaciones de los métodos existentes que a menudo se ajustan a arquitecturas específicas.

Une étude systématique a été réalisée sur des patchs adversariaux universels et transférables ciblant les modèles Vision-Language-Action (VLA), révélant leur vulnérabilité aux attaques. Le cadre UPA-RFAS introduit vise à créer un seul patch physique pouvant être efficacement transféré entre différents modèles, abordant ainsi les limitations des méthodes existantes qui s'adaptent souvent à des architectures spécifiques.

A systematic study has been conducted on universal, transferable adversarial patches targeting Vision-Language-Action (VLA) models, revealing their vulnerability to attacks. The introduced UPA-RFAS framework aims to create a single physical patch that can effectively transfer across different models, addressing the limitations of existing methods that often overfit to specific architectures.

When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

arXiv:2511.21663v1 Announce Type: new 
Abstract: In recent years, Vision-Language-Action (VLA) models in embodied intelligence have developed rapidly. However, existing adversarial attack methods require costly end-to-end training and often generate noticeable perturbation patches. To address these limitations, we propose ADVLA, a framework that directly applies adversarial perturbations on features projected from the visual encoder into the textual feature space. ADVLA efficiently disrupts downstream action predictions under low-amplitude constraints, and attention guidance allows the perturbations to be both focused and sparse. We introduce three strategies that enhance sensitivity, enforce sparsity, and concentrate perturbations. Experiments demonstrate that under an $L_{\infty}=4/255$ constraint, ADVLA combined with Top-K masking modifies less than 10% of the patches while achieving an attack success rate of nearly 100%. The perturbations are concentrated on critical regions, remain almost imperceptible in the overall image, and a single-step iteration takes only about 0.06 seconds, significantly outperforming conventional patch-based attacks. In summary, ADVLA effectively weakens downstream action predictions of VLA models under low-amplitude and locally sparse conditions, avoiding the high training costs and conspicuous perturbations of traditional patch attacks, and demonstrates unique effectiveness and practical value for attacking VLA feature spaces.

تم تقديم إطار عمل جديد يسمى ADVLA لتحسين فعالية الهجمات العدائية على نماذج العمل-اللغة-الرؤية (VLA) من خلال تطبيق الاضطرابات مباشرة على الميزات التي تم إسقاطها من المشفرات البصرية إلى الفضاءات النصية. تتيح هذه الطريقة حدوث اضطرابات مركزة ونادرة، مما يحقق معدل نجاح للهجوم يقارب 100% مع تعديل أقل من 10% من الرقع تحت قيود صارمة.

Se ha introducido un nuevo marco llamado ADVLA para mejorar la efectividad de los ataques adversariales en modelos de Acción-Lenguaje-Visión (VLA) aplicando perturbaciones directamente en las características proyectadas desde los codificadores visuales a los espacios textuales. Este método permite perturbaciones enfocadas y escasas, logrando una tasa de éxito de ataque de casi el 100 % mientras modifica menos del 10 % de los parches bajo estrictas restricciones.

Un nouveau cadre nommé ADVLA a été introduit pour améliorer l'efficacité des attaques adversariales sur les modèles Vision-Language-Action (VLA) en appliquant des perturbations directement sur les caractéristiques projetées des encodeurs visuels dans des espaces textuels. Cette méthode permet des perturbations ciblées et éparses, atteignant un taux de succès d'attaque de près de 100 % tout en modifiant moins de 10 % des patchs sous des contraintes strictes.

A new framework named ADVLA has been introduced to enhance the effectiveness of adversarial attacks on Vision-Language-Action (VLA) models by applying perturbations directly on features projected from visual encoders into textual spaces. This method allows for focused and sparse perturbations, achieving a nearly 100% attack success rate while modifying less than 10% of the patches under strict constraints.

Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models

arXiv:2511.16203v2 Announce Type: replace 
Abstract: Vision-Language-Action models (VLAs) have recently demonstrated remarkable progress in embodied environments, enabling robots to perceive, reason, and act through unified multimodal understanding. Despite their impressive capabilities, the adversarial robustness of these systems remains largely unexplored, especially under realistic multimodal and black-box conditions. Existing studies mainly focus on single-modality perturbations and overlook the cross-modal misalignment that fundamentally affects embodied reasoning and decision-making. In this paper, we introduce VLA-Fool, a comprehensive study of multimodal adversarial robustness in embodied VLA models under both white-box and black-box settings. VLA-Fool unifies three levels of multimodal adversarial attacks: (1) textual perturbations through gradient-based and prompt-based manipulations, (2) visual perturbations via patch and noise distortions, and (3) cross-modal misalignment attacks that intentionally disrupt the semantic correspondence between perception and instruction. We further incorporate a VLA-aware semantic space into linguistic prompts, developing the first automatically crafted and semantically guided prompting framework. Experiments on the LIBERO benchmark using a fine-tuned OpenVLA model reveal that even minor multimodal perturbations can cause significant behavioral deviations, demonstrating the fragility of embodied multimodal alignment.

أدت التقدمات الأخيرة في نماذج العمل-اللغة-الرؤية (VLA) إلى تقديم VLA-Fool، وهي دراسة تبحث في قوة هذه الأنظمة ضد الهجمات العدائية في ظروف صندوق أبيض وصندوق أسود. تبرز هذه الأبحاث نقاط الضعف في نماذج VLA، خاصة في سياق عدم التوافق بين الأنماط الذي يمكن أن يعيق عمليات اتخاذ القرار.

Los recientes avances en los modelos de Acción-Lenguaje-Visión (VLA) han llevado a la introducción de VLA-Fool, un estudio que investiga la robustez ante ataques adversariales de estos sistemas en condiciones tanto de caja blanca como de caja negra. Esta investigación destaca las vulnerabilidades de los VLA, especialmente en el contexto de la desalineación entre modalidades que puede obstaculizar los procesos de toma de decisiones.

Les avancées récentes dans les modèles Vision-Language-Action (VLA) ont conduit à l'introduction de VLA-Fool, une étude qui examine la robustesse aux attaques adversariales de ces systèmes dans des conditions à la fois en boîte blanche et en boîte noire. Cette recherche met en lumière les vulnérabilités des VLA, en particulier dans le contexte du désalignement intermodal qui peut entraver les processus décisionnels.

Recent advancements in Vision-Language-Action (VLA) models have led to the introduction of VLA-Fool, a study that investigates the adversarial robustness of these systems under both white-box and black-box conditions. This research highlights the vulnerabilities of VLAs, particularly in the context of cross-modal misalignment that can hinder decision-making processes.

When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

Was this article worth reading? Share it

GPTHumanizer

The Visualizer

Dyad