When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
NeutralArtificial Intelligence
- A recent study has examined the vulnerability of Large Language Model (LLM)-based scientific reviewers to indirect prompt injection, focusing on the potential to alter peer review decisions from 'Reject' to 'Accept'. This research introduces a new metric, the Weighted Adversarial Vulnerability Score (WAVS), and evaluates 15 attack strategies across 13 LLMs, including GPT-5 and DeepSeek, using a dataset of 200 scientific papers.
- The findings are significant as they highlight the risks associated with the increasing reliance on LLMs in scientific peer review processes, particularly as institutions like AAAI and Stanford implement AI-driven assessment systems. Understanding these vulnerabilities is crucial for maintaining the integrity of scientific evaluations.
- This development reflects broader concerns regarding the reliability of AI in critical tasks, such as political fact-checking and error detection in published literature. As LLMs become more integrated into various domains, the need for robust evaluation frameworks and safeguards against manipulation becomes increasingly urgent.
— via World Pulse Now AI Editorial System


