LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions

arXiv — cs.CLWednesday, December 10, 2025 at 5:00:00 AM
  • Large language models (LLMs) are increasingly being integrated into multi-agent systems (MAS), where peer interactions significantly influence decision-making. A recent study introduces KAIROS, a benchmark designed to simulate collaborative quiz-style interactions among peer agents, allowing for a detailed analysis of how rapport and peer behaviors affect LLMs' decision-making processes.
  • This development is crucial as it highlights the limitations of LLMs in handling complex social dynamics, particularly their ability to build rapport and discern high-quality information from peers. Understanding these limitations can guide future improvements in LLM design and functionality.
  • The challenges faced by LLMs in multi-agent environments underscore a broader discourse on the ethical implications and performance of AI systems. As LLMs evolve, their ability to replicate human-like cooperation and moral reasoning becomes increasingly relevant, raising questions about their deployment in real-world applications and the necessity for frameworks that ensure ethical behavior.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Continue Readings
Representational Stability of Truth in Large Language Models
NeutralArtificial Intelligence
Large language models (LLMs) are increasingly utilized for factual inquiries, yet their internal representations of truth remain inadequately understood. A recent study introduces the concept of representational stability, assessing how robustly LLMs differentiate between true, false, and ambiguous statements through controlled experiments involving linear probes and model activations.
Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in LLMs
NeutralArtificial Intelligence
Large language models (LLMs) exhibit two mechanisms of value expression: intrinsic, based on learned values, and prompted, based on explicit prompts. This study analyzes these mechanisms at a mechanistic level, revealing both shared and unique components in their operation.
Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks
PositiveArtificial Intelligence
A new hierarchical multi-agent architecture has been introduced to enhance the capabilities of Large Language Models (LLMs) in solving complex long-horizon tasks. This system utilizes a grid of lightweight agents and a selective oracle, employing a spatial curriculum to progressively expand operational regions and improve task mastery. Negative Log-Likelihood is integrated to prioritize training areas based on agent accuracy and calibration.
Why Chain of Thought Fails in Clinical Text Understanding
NeutralArtificial Intelligence
A systematic study has revealed that chain-of-thought (CoT) prompting, which is often used to enhance reasoning in large language models (LLMs), fails to improve performance in clinical text understanding. The research assessed 95 advanced LLMs across 87 real-world clinical tasks, finding that 86.3% of models experienced performance degradation in CoT settings, particularly with electronic health records that are lengthy and fragmented.
Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals
NeutralArtificial Intelligence
A recent pilot study explored the effectiveness of framing evaluation tasks for large language models (LLMs) as a betting game, utilizing a fictional currency called LLMCoin. The study involved generating 100 math and logic questions, with models predicting the accuracy of baseline responses under two conditions: a control scenario and an incentive-based scenario with wagers. Results indicated that the incentive condition yielded a modest increase in prediction accuracy.
RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models
NeutralArtificial Intelligence
A recent study titled 'RL-MTJail' explores the vulnerabilities of large language models (LLMs) to jailbreak attacks, focusing on black-box multi-turn jailbreaks. The research proposes a reinforcement learning framework to optimize the harmfulness of outputs through a series of prompt-output interactions, addressing the limitations of existing single-turn optimization methods.
LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples
PositiveArtificial Intelligence
A new framework called LUNE has been introduced, enabling efficient unlearning in large language models (LLMs) through LoRA fine-tuning with negative examples. This method allows for targeted suppression of specific knowledge without the need for extensive computational resources, addressing challenges related to privacy and bias mitigation.
Beyond the Singular: Revealing the Value of Multiple Generations in Benchmark Evaluation
NeutralArtificial Intelligence
A recent study highlights the importance of incorporating multiple generations in the evaluation of large language models (LLMs) to enhance benchmark accuracy. The proposed hierarchical statistical model addresses the randomness inherent in LLMs, which traditional evaluation methods often overlook. This approach aims to provide a more reliable assessment of LLM capabilities by reducing variance in benchmark score estimates.