Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression

arXiv — cs.LG•Tuesday, December 9, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

Recent research has introduced a controlled evaluation framework to assess the generalization capabilities of large language models (LLMs) like BERT, Qwen2, and LLaMA under various logical perturbations, including rule deletion and contradictory evidence. The findings indicate that these models maintain high accuracy despite structural changes in reasoning tasks.
This development is significant as it sheds light on the robustness of LLMs in logical reasoning, highlighting their ability to adapt to changes in input without compromising performance. Understanding these capabilities is crucial for advancing AI applications in complex reasoning scenarios.
The study aligns with ongoing discussions regarding the limitations and strengths of LLMs in reasoning tasks, particularly their vulnerability to flawed premises and the need for improved frameworks to evaluate their reasoning processes. This reflects a broader trend in AI research focused on enhancing the reliability and interpretability of LLM outputs.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

Supametas.AI

Extract and structure unstructured data for seamless LLM RAG integration.

AI & DataView app details

Langfuse

Debug, monitor, and improve your complex LLM applications with ease.

Tech & Developer ToolsView app details

Continue Readings

arXiv — cs.CL2 days ago

Representational Stability of Truth in Large Language Models

NeutralArtificial Intelligence

Large language models (LLMs) are increasingly utilized for factual inquiries, yet their internal representations of truth remain inadequately understood. A recent study introduces the concept of representational stability, assessing how robustly LLMs differentiate between true, false, and ambiguous statements through controlled experiments involving linear probes and model activations.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in LLMs

NeutralArtificial Intelligence

Large language models (LLMs) exhibit two mechanisms of value expression: intrinsic, based on learned values, and prompted, based on explicit prompts. This study analyzes these mechanisms at a mechanistic level, revealing both shared and unique components in their operation.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions

NeutralArtificial Intelligence

Large language models (LLMs) are increasingly being integrated into multi-agent systems (MAS), where peer interactions significantly influence decision-making. A recent study introduces KAIROS, a benchmark designed to simulate collaborative quiz-style interactions among peer agents, allowing for a detailed analysis of how rapport and peer behaviors affect LLMs' decision-making processes.

Read full article

via arXiv — cs.CL

arXiv — cs.LG2 days ago

Using Text-Based Life Trajectories from Swedish Register Data to Predict Residential Mobility with Pretrained Transformers

PositiveArtificial Intelligence

A recent study has transformed extensive Swedish register data into textual life trajectories to predict residential mobility, utilizing data from 6.9 million individuals between 2001 and 2013. By converting demographic and life changes into semantically rich texts, the research employs various NLP architectures, including LSTM and BERT, to enhance prediction accuracy for residential moves from 2013 to 2017.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Language Models for Controllable DNA Sequence Design

PositiveArtificial Intelligence

Researchers have introduced ATGC-Gen, an Automated Transformer Generator designed for controllable DNA sequence design, which generates sequences based on specific biological properties. This model utilizes cross-modal encoding and can operate under various transformer architectures, enhancing its flexibility in training and generation tasks, particularly in promoter and enhancer sequence design.

Read full article

via arXiv — cs.LG

arXiv — stat.ML2 days ago

LUNA: Linear Universal Neural Attention with Generalization Guarantees

PositiveArtificial Intelligence

A new linear attention mechanism named LUNA has been introduced, addressing the computational bottleneck of traditional softmax attention, which operates at a quadratic cost. LUNA achieves linear cost while maintaining or exceeding the accuracy of quadratic attention by learning the kernel feature map tailored to specific data and tasks.

Read full article

via arXiv — stat.ML

arXiv — cs.LG3 days ago

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

NeutralArtificial Intelligence

A recent study titled 'RL-MTJail' explores the vulnerabilities of large language models (LLMs) to jailbreak attacks, focusing on black-box multi-turn jailbreaks. The research proposes a reinforcement learning framework to optimize the harmfulness of outputs through a series of prompt-output interactions, addressing the limitations of existing single-turn optimization methods.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples

PositiveArtificial Intelligence

A new framework called LUNE has been introduced, enabling efficient unlearning in large language models (LLMs) through LoRA fine-tuning with negative examples. This method allows for targeted suppression of specific knowledge without the need for extensive computational resources, addressing challenges related to privacy and bias mitigation.

Read full article

via arXiv — cs.LG