Beyond Over-Refusal: Scenario-Based Diagnostics and Post-Hoc Mitigation for Exaggerated Refusals in LLMs

arXiv — cs.CL•Friday, December 12, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

Large language models (LLMs) often produce exaggerated refusals, declining benign requests due to safety concerns. To address this, researchers introduced two benchmarks: the Exaggerated Safety Benchmark (XSB) for single-turn prompts and the Multi-turn Scenario-based Exaggerated Safety Benchmark (MS-XSB), which assess refusal calibration in realistic dialogues. These benchmarks highlight persistent refusal issues across various LLMs, particularly in complex scenarios.
The introduction of these benchmarks is significant as it provides a structured approach to identify and mitigate exaggerated refusals in LLMs, enhancing their reliability and usability. By employing model-agnostic methods such as ignore-word instructions and prompt rephrasing, the research aims to improve LLM performance without necessitating retraining or parameter access.
This development reflects ongoing challenges in the field of artificial intelligence, particularly regarding the balance between safety and functionality in LLMs. The persistence of exaggerated refusals underscores a broader debate about the limitations of current safety protocols and the need for innovative solutions to ensure that LLMs can respond appropriately in diverse contexts, while also maintaining user trust.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

AQ

Fast, small, and safe interpreted language for streamlined development tasks.

Business & ProductivityView app details

Discover Negotiation Wars aka Debates Simulator on Uneed

Master negotiation skills through simulated debates with AI-powered scenarios and feedback.

Lifestyle & HealthView app details

LangWatch

Monitor and improve your AI applications for quality, safety, and reliability.

AI & DataView app details

Langfuse

Debug, monitor, and improve your complex LLM applications with ease.

Tech & Developer ToolsView app details

Grubby.AI

Humanize AI text instantly to pass Turnitin and other detectors with ease.

Lifestyle & HealthView app details

Continue Readings

arXiv — cs.CL2 days ago

Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks

NeutralArtificial Intelligence

A recent study has established the first tight lower bounds on the runtime of deterministic speculative generation algorithms for large language models (LLMs), revealing insights into the token generation process through branching random walks. This research provides a mathematical framework to analyze the efficiency of speculative generation, a technique aimed at accelerating inference in LLMs by verifying multiple draft tokens simultaneously.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression

NeutralArtificial Intelligence

Large language models (LLMs) have been evaluated for their reasoning reliability through a framework that tests their performance under various logical perturbations, including rule deletion and contradictory evidence. The study found that while models like BERT, Qwen2, and LLaMA performed well under redundant rule deletion, essential rule removal significantly impacted their accuracy.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

Mistake Notebook Learning: Selective Batch-Wise Context Optimization for In-Context Learning

PositiveArtificial Intelligence

A new framework called Mistake Notebook Learning (MNL) has been introduced to enhance the performance of large language models (LLMs) by utilizing a persistent knowledge base of abstracted error patterns. This approach allows for batch-wise error abstraction, enabling models to learn from multiple failures and retain only effective guidance, achieving performance close to supervised fine-tuning on benchmarks like GSM8K.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference

PositiveArtificial Intelligence

A new approach called Adaptive Speculative Decoding (AdaSD) has been proposed to enhance the efficiency of large language model (LLM) inference by dynamically adjusting generation length and acceptance criteria in real time, eliminating the need for extensive pre-analysis or hyperparameter tuning. This method utilizes adaptive thresholds based on token entropy and Jensen-Shannon distance to optimize the decoding process.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

The Illusion of Readiness in Health AI

NegativeArtificial Intelligence

Recent research highlights significant limitations in the readiness of large language models (LLMs) for healthcare applications, revealing their vulnerability to simple adversarial transformations and inconsistencies in reasoning. Despite impressive performance on medical benchmarks, these models exhibit notable brittleness and competency gaps, raising concerns about their reliability in real-world health scenarios.

Read full article

via arXiv — cs.CL

Phys.org — AI & Machine Learning2 days ago

AI agents debate their way to improved mathematical reasoning

NeutralArtificial Intelligence

Recent advancements in large language models (LLMs) have led to AI agents engaging in debates to enhance their mathematical reasoning capabilities. These AI systems, capable of processing and generating text, have shown improvements but still struggle with factual inaccuracies and logical inconsistencies in their outputs.

Read full article

via Phys.org — AI & Machine Learning

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about