When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

arXiv — cs.CVThursday, November 27, 2025 at 5:00:00 AM
  • A systematic study has been conducted on universal, transferable adversarial patches targeting Vision-Language-Action (VLA) models, revealing their vulnerability to attacks. The introduced UPA-RFAS framework aims to create a single physical patch that can effectively transfer across different models, addressing the limitations of existing methods that often overfit to specific architectures.
  • This development is significant as it enhances the understanding of adversarial robustness in VLA-driven robots, which are increasingly utilized in various applications. By improving the transferability of attacks, researchers can better assess and fortify these systems against potential threats.
  • The exploration of adversarial attacks on VLA models highlights ongoing concerns regarding the security and reliability of multimodal AI systems. As advancements in these technologies continue, the need for robust defenses against both white-box and black-box attacks becomes critical, emphasizing the importance of developing comprehensive strategies to safeguard against vulnerabilities.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models
PositiveArtificial Intelligence
A new framework named ADVLA has been introduced to enhance the effectiveness of adversarial attacks on Vision-Language-Action (VLA) models by applying perturbations directly on features projected from visual encoders into textual spaces. This method allows for focused and sparse perturbations, achieving a nearly 100% attack success rate while modifying less than 10% of the patches under strict constraints.
When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models
NeutralArtificial Intelligence
Recent advancements in Vision-Language-Action (VLA) models have led to the introduction of VLA-Fool, a study that investigates the adversarial robustness of these systems under both white-box and black-box conditions. This research highlights the vulnerabilities of VLAs, particularly in the context of cross-modal misalignment that can hinder decision-making processes.