Reasoning Up the Instruction Ladder for Controllable Language Models

arXiv — cs.CLThursday, November 13, 2025 at 5:00:00 AM
The study on instruction hierarchy (IH) for large language models (LLMs) highlights the necessity of prioritizing instructions from various sources to enhance reliability in decision-making. As LLMs are deployed in high-stakes environments, the ability to reconcile competing directives becomes vital. The researchers created the VerIH dataset, which includes tasks with both aligned and conflicting instructions, to train models in this area. By employing lightweight reinforcement learning with VerIH, they demonstrated significant improvements in the models' ability to follow instructions and prioritize them effectively. This approach not only boosts performance on instruction-following benchmarks but also generalizes reasoning capabilities to safety-critical settings, underscoring the importance of developing LLMs that can navigate complex instruction hierarchies.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm
PositiveArtificial Intelligence
The paper titled 'Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm' addresses vulnerabilities in sequential recommenders, particularly to adversarial attacks. It highlights the Profile Pollution Attack (PPA), which subtly contaminates user interactions to induce mispredictions. The authors propose a new method called CREAT, which combines bi-level optimization with reinforcement learning to enhance the stealthiness and effectiveness of such attacks, overcoming limitations of previous methods.
Multimodal Peer Review Simulation with Actionable To-Do Recommendations for Community-Aware Manuscript Revisions
PositiveArtificial Intelligence
A new interactive web-based system for multimodal peer review simulation has been introduced, aimed at enhancing manuscript revisions prior to submission. This system leverages large language models (LLMs) to integrate textual and visual information, improving the quality of reviews through retrieval-augmented generation (RAG) based on OpenReview data. It converts generated reviews into actionable to-do lists, providing structured guidance for authors and seamlessly integrating with existing academic writing platforms.
Preference Orchestrator: Prompt-Aware Multi-Objective Alignment for Large Language Models
PositiveArtificial Intelligence
The article introduces the PReference Orchestrator (PRO), a framework designed to enhance the alignment of Large Language Models (LLMs) with diverse human preferences across multiple objectives. Traditional methods rely on manually set preference weights, which can hinder training efficiency and complicate user experience. PRO addresses these challenges by utilizing a lightweight preference adapter that automatically infers prompt-specific preference weights during both training and deployment, thereby improving performance and efficiency.
Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
NeutralArtificial Intelligence
The paper titled 'Modeling and Predicting Multi-Turn Answer Instability in Large Language Models' discusses the evaluation of large language models (LLMs) in terms of their robustness during user interactions. The study employs multi-turn follow-up prompts to assess changes in model answers and accuracy dynamics using Markov chains. Results indicate vulnerabilities in LLMs, with a 10% accuracy drop for Gemini 1.5 Flash after a 'Think again' prompt over nine turns, and a 7.5% drop for Claude 3.5 Haiku with a reworded question. The findings suggest that accuracy can be modeled over time.
Pre-Attention Expert Prediction and Prefetching for Mixture-of-Experts Large Language Models
PositiveArtificial Intelligence
The paper titled 'Pre-Attention Expert Prediction and Prefetching for Mixture-of-Experts Large Language Models' introduces a method to enhance the efficiency of Mixture-of-Experts (MoE) Large Language Models (LLMs). The authors propose a pre-attention expert prediction technique that improves accuracy and reduces computational overhead by utilizing activations before the attention block. This approach aims to optimize expert prefetching, achieving about a 15% improvement in accuracy over existing methods.
A Multifaceted Analysis of Negative Bias in Large Language Models through the Lens of Parametric Knowledge
NeutralArtificial Intelligence
A recent study published on arXiv examines the phenomenon of negative bias in large language models (LLMs), which refers to their tendency to generate negative responses in binary decision tasks. The research highlights that previous studies have primarily focused on identifying negative attention heads that contribute to this bias. The authors introduce a new evaluation pipeline that categorizes responses based on the model's parametric knowledge, revealing that the format of prompts significantly influences the responses more than the semantics of the content itself.
Who Gets the Reward, Who Gets the Blame? Evaluation-Aligned Training Signals for Multi-LLM Agents
PositiveArtificial Intelligence
The article discusses a new theoretical framework for training multi-agent systems using large language models (LLMs). It aims to connect system-level evaluations with agent-level learning by integrating cooperative game-theoretic attribution and process reward modeling. This approach produces local, signed, and credit-conserving signals, enhancing cooperation among agents while penalizing harmful actions in failure scenarios.
Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction
PositiveArtificial Intelligence
The article presents Thinker, a hierarchical thinking model designed to enhance the reasoning capabilities of large language models (LLMs) through multi-turn interactions. Unlike previous methods that relied on end-to-end reinforcement learning without supervision, Thinker allows for a more structured reasoning process by breaking down complex problems into manageable sub-problems. Each sub-problem is represented in both natural language and logical functions, improving the coherence and rigor of the reasoning process.