Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs

arXiv — cs.CV•Wednesday, December 3, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

A recent study has introduced a behavioral benchmark called BayesBench to evaluate the performance of large language models (LLMs) in multimodal integration tasks, inspired by psychophysics research. The study assesses nine LLMs, including GPT-5 Mini, through magnitude estimation tasks involving text and images, revealing insights into their implicit computational strategies and Bayesian behavior.
This development is significant as it provides a structured approach to understanding how LLMs process and integrate information, potentially leading to improvements in their design and application in various fields, including AI and machine learning.
The findings contribute to ongoing discussions about the capabilities of LLMs in mimicking human-like reasoning and decision-making processes, highlighting the importance of optimal cue combination in AI systems. This aligns with broader research trends exploring the intersection of AI, human cognition, and the challenges of context drift in multi-turn interactions.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Llanai

Master a new language with personalized AI lessons tailored to your learning style.

Lifestyle & HealthTry the app

Keywords AI

Monitor and optimize your AI models with comprehensive observability tools.

Business & ProductivityTry the app

Sellm

Track brand mentions across ChatGPT, Perplexity, and other AI platforms.

Marketing & CommerceTry the app

Continue Readings

arXiv — cs.CV16 hours ago

InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration

PositiveArtificial Intelligence

The introduction of InEx presents a novel approach to mitigating hallucinations in large language models (LLMs) by employing a training-free, multi-agent framework that incorporates introspective reasoning and cross-modal collaboration. This method aims to enhance the reliability of multimodal LLMs (MLLMs) by autonomously refining responses through iterative verification processes.

Read full article

via arXiv — cs.CV

Tech Xplore — AI & MLa day ago

LLMs choose friends and colleagues like people, researchers find

PositiveArtificial Intelligence

Researchers have found that large language models (LLMs) make decisions about networking and friendship in ways that closely resemble human behavior, both in synthetic simulations and real-world contexts. This suggests that LLMs can replicate social decision-making processes similar to those of people.

Read full article

via Tech Xplore — AI & ML

arXiv — cs.LG2 days ago

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

PositiveArtificial Intelligence

A new quantization method called Four Over Six (4/6) has been introduced to enhance the NVFP4 quantization algorithm, which is crucial for the efficient training and inference of large language models (LLMs). This method evaluates multiple scale factors for blocks of values, aiming to reduce quantization errors that can lead to performance degradation during model training and inference.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

MoH: Multi-Head Attention as Mixture-of-Head Attention

PositiveArtificial Intelligence

The recent introduction of Mixture-of-Head attention (MoH) enhances the multi-head attention mechanism central to Transformer models, aiming to improve efficiency while maintaining or exceeding previous accuracy levels. This new architecture allows tokens to select relevant attention heads, thereby optimizing inference without increasing parameters.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe

PositiveArtificial Intelligence

Recent advancements in Diffusion Mixture-of-Experts (MoE) models have shifted focus from routing mechanisms to architectural configurations, revealing that factors like expert modules and attention encodings are crucial for model effectiveness. This systematic study emphasizes the importance of tuning these configurations to maximize performance in both latent and pixel-space diffusion frameworks.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation

PositiveArtificial Intelligence

A new method called ZIP-RC has been introduced to enhance the inference capabilities of large language models (LLMs) by enabling real-time prediction of reward and cost during generation. This approach addresses the limitations of existing test-time scaling methods, which often lead to increased costs and latency without providing adaptive inference capabilities.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

NeutralArtificial Intelligence

Recent research introduces Semantically Equivalent and Coherent Attacks (SECA), a method designed to elicit hallucinations from Large Language Models (LLMs) through realistic prompt modifications that maintain semantic coherence. This approach addresses the limitations of previous adversarial attacks that often resulted in unrealistic prompts, thereby enhancing understanding of how hallucinations can occur in practical applications.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

InnoGym: Benchmarking the Innovation Potential of AI Agents

PositiveArtificial Intelligence

InnoGym has been introduced as the first benchmark and framework designed to systematically evaluate the innovation potential of AI agents, focusing on both performance gain and novelty across 18 tasks from real-world engineering and scientific domains. This initiative aims to address the limitations of existing benchmarks that primarily measure correctness without considering the diversity of methods behind solutions.

Read full article

via arXiv — cs.CV