ReviewGuard: Enhancing Deficient Peer Review Detection via LLM-Driven Data Augmentation

arXiv — cs.CLMonday, November 24, 2025 at 5:00:00 AM
  • ReviewGuard has been introduced as an automated system designed to detect and categorize deficient peer reviews, leveraging a four-stage framework that includes data collection, annotation, synthetic data augmentation, and model fine-tuning. This initiative addresses the growing concerns regarding the integrity of academic reviews, particularly in light of the increasing use of large language models (LLMs) in scholarly evaluations.
  • The development of ReviewGuard is significant as it aims to enhance the reliability of peer reviews, which are crucial for maintaining academic standards. By identifying deficient reviews, the system seeks to mitigate the risks posed by both human and AI-generated evaluations, thereby reinforcing the credibility of scientific discourse.
  • This advancement highlights ongoing challenges in the academic community regarding the quality of peer reviews, especially as LLMs become more prevalent. The contrasting findings from studies on lexical diversity in AI-generated texts and the exploration of reasoning capabilities in language models underscore the complexities of integrating AI into scholarly processes, raising questions about the balance between efficiency and quality in academic evaluations.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Why the long interface? AI systems don't 'get' the joke, research reveals
NeutralArtificial Intelligence
A recent study indicates that advanced AI systems like ChatGPT and Gemini simulate an understanding of humor but do not genuinely comprehend jokes. This finding highlights a significant limitation in the capabilities of these AI models, which are often perceived as more intelligent than they are.
Five crucial ways LLMs can endanger your privacy
NegativeArtificial Intelligence
Privacy concerns surrounding large language models (LLMs) like ChatGPT, Anthropic, and Gemini have escalated, as highlighted by a Northeastern University computer science expert. The issues extend beyond the data these algorithms process, raising alarms about user privacy and data security.
Meet the Group Breaking People Out of AI Delusions
NegativeArtificial Intelligence
A group is actively working to help individuals recognize and break free from their delusions related to artificial intelligence, particularly those who have become overly reliant on AI tools like ChatGPT. This phenomenon highlights a growing concern about the psychological impact of AI on users, as some individuals no longer feel the need for human interaction.
A Research Leader Behind ChatGPT’s Mental Health Work Is Leaving OpenAI
NeutralArtificial Intelligence
A key research leader involved in ChatGPT's mental health initiatives is departing from OpenAI, which raises questions about the future direction of AI safety research, particularly in how the chatbot interacts with users in crisis situations. This change comes at a time when OpenAI is expanding its features, including group chats and a free version for educators.
‘Holy S***… I’m Not Going Back to ChatGPT,’ Says Marc Benioff After Using Gemini 3
PositiveArtificial Intelligence
Marc Benioff, CEO of Salesforce, expressed his strong preference for Google's Gemini 3 over OpenAI's ChatGPT, stating, 'Holy S***… I’m Not Going Back to ChatGPT' after experiencing the new AI model. This statement highlights the growing competition between Google and OpenAI in the AI landscape.
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
NeutralArtificial Intelligence
The study introduces PARROT, a framework designed to assess the accuracy degradation in large language models (LLMs) under social pressure, particularly focusing on the phenomenon of sycophancy. By comparing neutral and authoritatively false responses, PARROT aims to quantify confidence shifts and classify various failure modes across 22 models evaluated with 1,302 questions across 13 domains.
MiniLLM: Knowledge Distillation of Large Language Models
PositiveArtificial Intelligence
A new approach to Knowledge Distillation (KD) has been proposed, focusing on effectively transferring knowledge from large language models (LLMs) to smaller models. This method replaces the traditional Kullback-Leibler divergence objective with a reverse KLD, which is better suited for generative models, thereby addressing the computational challenges associated with LLMs.
Do LLMs produce texts with "human-like" lexical diversity?
NegativeArtificial Intelligence
A recent study has examined the lexical diversity of texts generated by various ChatGPT models, including ChatGPT-3.5, ChatGPT-4, ChatGPT-o4 mini, and ChatGPT-4.5, comparing them to texts written by native and non-native English speakers. The findings indicate significant differences in lexical diversity metrics, suggesting that LLMs do not produce writing that is truly human-like.