Beyond Over-Refusal: Scenario-Based Diagnostics and Post-Hoc Mitigation for Exaggerated Refusals in LLMs

arXiv — cs.CLFriday, December 12, 2025 at 5:00:00 AM
  • Large language models (LLMs) often produce exaggerated refusals, declining benign requests due to safety concerns. To address this, researchers introduced two benchmarks: the Exaggerated Safety Benchmark (XSB) for single-turn prompts and the Multi-turn Scenario-based Exaggerated Safety Benchmark (MS-XSB), which assess refusal calibration in realistic dialogues. These benchmarks highlight persistent refusal issues across various LLMs, particularly in complex scenarios.
  • The introduction of these benchmarks is significant as it provides a structured approach to identify and mitigate exaggerated refusals in LLMs, enhancing their reliability and usability. By employing model-agnostic methods such as ignore-word instructions and prompt rephrasing, the research aims to improve LLM performance without necessitating retraining or parameter access.
  • This development reflects ongoing challenges in the field of artificial intelligence, particularly regarding the balance between safety and functionality in LLMs. The persistence of exaggerated refusals underscores a broader debate about the limitations of current safety protocols and the need for innovative solutions to ensure that LLMs can respond appropriately in diverse contexts, while also maintaining user trust.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks
NeutralArtificial Intelligence
A recent study has established the first tight lower bounds on the runtime of deterministic speculative generation algorithms for large language models (LLMs), revealing insights into the token generation process through branching random walks. This research provides a mathematical framework to analyze the efficiency of speculative generation, a technique aimed at accelerating inference in LLMs by verifying multiple draft tokens simultaneously.
Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression
NeutralArtificial Intelligence
Large language models (LLMs) have been evaluated for their reasoning reliability through a framework that tests their performance under various logical perturbations, including rule deletion and contradictory evidence. The study found that while models like BERT, Qwen2, and LLaMA performed well under redundant rule deletion, essential rule removal significantly impacted their accuracy.
Mistake Notebook Learning: Selective Batch-Wise Context Optimization for In-Context Learning
PositiveArtificial Intelligence
A new framework called Mistake Notebook Learning (MNL) has been introduced to enhance the performance of large language models (LLMs) by utilizing a persistent knowledge base of abstracted error patterns. This approach allows for batch-wise error abstraction, enabling models to learn from multiple failures and retain only effective guidance, achieving performance close to supervised fine-tuning on benchmarks like GSM8K.
AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference
PositiveArtificial Intelligence
A new approach called Adaptive Speculative Decoding (AdaSD) has been proposed to enhance the efficiency of large language model (LLM) inference by dynamically adjusting generation length and acceptance criteria in real time, eliminating the need for extensive pre-analysis or hyperparameter tuning. This method utilizes adaptive thresholds based on token entropy and Jensen-Shannon distance to optimize the decoding process.
The Illusion of Readiness in Health AI
NegativeArtificial Intelligence
Recent research highlights significant limitations in the readiness of large language models (LLMs) for healthcare applications, revealing their vulnerability to simple adversarial transformations and inconsistencies in reasoning. Despite impressive performance on medical benchmarks, these models exhibit notable brittleness and competency gaps, raising concerns about their reliability in real-world health scenarios.
AI agents debate their way to improved mathematical reasoning
NeutralArtificial Intelligence
Recent advancements in large language models (LLMs) have led to AI agents engaging in debates to enhance their mathematical reasoning capabilities. These AI systems, capable of processing and generating text, have shown improvements but still struggle with factual inaccuracies and logical inconsistencies in their outputs.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about