Can Machines Think Like Humans? A Behavioral Evaluation of LLM Agents in Dictator Games

arXiv — cs.LG•Wednesday, November 19, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

A recent study explored the prosocial behaviors of Large Language Model (LLM) agents in dictator games, revealing that merely assigning human
This development is significant as it challenges assumptions about the capabilities of LLMs in mimicking human behavior, emphasizing the need for a deeper understanding of AI decision
The findings contribute to ongoing discussions about the limitations of AI in replicating human

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.LG19 hours ago

MalRAG: A Retrieval-Augmented LLM Framework for Open-set Malicious Traffic Identification

PositiveArtificial Intelligence

MalRAG is a novel retrieval-augmented framework designed for the fine-grained identification of open-set malicious traffic in cybersecurity. As cyber threats continuously evolve, the ability to detect both known and new types of malicious traffic is paramount. This framework utilizes a frozen large language model (LLM) to construct a comprehensive traffic knowledge database, employing adaptive retrieval and prompt engineering techniques to enhance identification capabilities.

Read full article

via arXiv — cs.LG

arXiv — cs.CL19 hours ago

SpiderGen: Towards Procedure Generation For Carbon Life Cycle Assessments with Generative AI

PositiveArtificial Intelligence

SpiderGen is a new workflow that utilizes large language models (LLMs) to enhance the process of conducting Life Cycle Assessments (LCAs) for consumer products. These assessments are crucial for understanding the environmental impact of goods, particularly in the context of greenhouse gas (GHG) emissions. SpiderGen integrates traditional LCA methodologies with the advanced reasoning capabilities of LLMs to produce graphical representations known as Product Category Rules Process Flow Graphs (PCR PFGs). The effectiveness of SpiderGen was evaluated against 65 real-world LCA documents.

Read full article

via arXiv — cs.CL

arXiv — cs.LG19 hours ago

Node-Level Uncertainty Estimation in LLM-Generated SQL

PositiveArtificial Intelligence

A new framework for detecting errors in SQL generated by large language models (LLMs) has been introduced, focusing on estimating uncertainty at the node level within the query's abstract syntax tree (AST). The method employs a semantically aware labeling algorithm to assess node correctness and utilizes a classifier to predict error probabilities for each node. This approach allows for precise diagnostics, significantly improving error detection compared to traditional token log-probabilities across various databases and datasets.

Read full article

via arXiv — cs.LG

arXiv — cs.CL19 hours ago

Scaling Textual Gradients via Sampling-Based Momentum

PositiveArtificial Intelligence

The article discusses the challenges and potential of scaling prompt optimization using LLM-provided textual gradients. While this method has proven effective for automatic prompt engineering, issues arise when increasing training data due to context-length limits and diminishing returns from long-context degradation. The authors propose a new approach called Textual Stochastic Gradient Descent with Momentum (TSGD-M), which utilizes momentum sampling to enhance training stability and scalability.

Read full article

via arXiv — cs.CL

arXiv — cs.LG19 hours ago

LogPurge: Log Data Purification for Anomaly Detection via Rule-Enhanced Filtering

PositiveArtificial Intelligence

Log anomaly detection is essential for identifying system failures and preventing security breaches by recognizing irregular patterns in large volumes of log data. Traditional methods depend on training deep learning models with clean log sequences, which are often difficult and costly to obtain due to the need for human labeling. Existing automatic cleaning methods do not adequately consider the specific characteristics of logs. The proposed LogPurge framework offers a cost-effective solution by using a rule-enhanced purification process that selects normal log sequences from contaminated dat…

Read full article

via arXiv — cs.LG

arXiv — cs.LG19 hours ago

ReflexGrad: Three-Way Synergistic Architecture for Zero-Shot Generalization in LLM Agents

PositiveArtificial Intelligence

ReflexGrad is a new architecture designed to enhance zero-shot generalization in large language model (LLM) agents. It integrates three mechanisms: hierarchical TODO decomposition for strategic planning, history-aware causal reflection for identifying failure causes, and gradient-based optimization for systematic improvement. This approach allows agents to learn from experiences without needing task-specific training, marking a significant advancement in reinforcement learning and decision-making.

Read full article

via arXiv — cs.LG

arXiv — cs.CV19 hours ago

OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model

PositiveArtificial Intelligence

OG-VLA is a new architecture that integrates Vision Language Action models with 3D-aware policies to enhance robot manipulation tasks. It addresses the challenge of translating natural language instructions and RGBD observations into robot actions. While 3D-aware policies excel in precise tasks, they often struggle with generalization to new scenarios. Conversely, VLAs are adept at generalizing across instructions but can be sensitive to variations in camera and robot poses. OG-VLA aims to improve this generalization by leveraging knowledge from language and vision models.

Read full article

via arXiv — cs.CV

arXiv — cs.CL19 hours ago

Encoding and Understanding Astrophysical Information in Large Language Model-Generated Summaries

NeutralArtificial Intelligence

Large Language Models (LLMs) have shown remarkable capabilities in generalizing across various domains and modalities. This study explores their potential to encode astrophysical information typically derived from scientific measurements. The research focuses on two primary questions: the impact of prompting on the codification of physical quantities by LLMs and the linguistic aspects crucial for encoding the physics represented by these measurements. Sparse autoencoders are utilized to extract interpretable features from the text.

Read full article

via arXiv — cs.CL