World PulseNowPowered by AI

Trending:

Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners

arXiv — cs.LG•Thursday, November 27, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

Recent research highlights that graph reasoners based on Large Language Models (LLMs) exhibit a lack of built-in invariance to symmetries in graph representations, leading to varied outputs under different conditions such as node reindexing and edge reordering. This study systematically analyzes the effects of fine-tuning on encoding sensitivity and generalization capabilities of LLMs, proposing a decomposition of graph serializations for better evaluation.
The findings underscore the importance of robustness in LLMs, particularly in applications requiring consistent reasoning across diverse graph structures. Larger, non-fine-tuned models demonstrated greater robustness, while fine-tuning improved sensitivity to node relabeling but increased vulnerability to structural variations, raising concerns about their reliability in practical scenarios.
This development reflects ongoing challenges in the field of artificial intelligence, particularly regarding the generalization abilities of LLMs in complex tasks. The issues of context drift in multi-turn interactions and the reliance on grammatical shortcuts rather than domain knowledge further complicate the landscape, emphasizing the need for improved methodologies in training and evaluating LLMs to enhance their performance and reliability.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

ConsoleX

Connect to all major LLMs in one unified development playground.

Business & ProductivityTry the app

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataTry the app

LLMrefs

Track your keyword rankings across AI search engines for better SEO performance.

Marketing & CommerceTry the app

Continue Readings

DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning

arXiv — cs.CLa day ago

DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning

PositiveArtificial Intelligence

The recent introduction of DESIGNER, a design-logic-guided reasoning data synthesis pipeline, aims to enhance the capabilities of large language models (LLMs) in tackling complex, multidisciplinary questions. By leveraging extensive raw documents, DESIGNER generates high-difficulty questions that challenge LLMs' reasoning abilities across various disciplines.

Read full article

via arXiv — cs.CL

SkeletonAgent: An Agentic Interaction Framework for Skeleton-based Action Recognition

arXiv — cs.CVa day ago

SkeletonAgent: An Agentic Interaction Framework for Skeleton-based Action Recognition

PositiveArtificial Intelligence

The SkeletonAgent framework has been introduced to enhance skeleton-based action recognition by integrating Large Language Models (LLMs) with a recognition model through two cooperative agents, the Questioner and Selector. This innovative approach aims to improve the accuracy of distinguishing similar actions by providing targeted guidance and feedback between the LLM and the recognition model.

Read full article

via arXiv — cs.CV

Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding

arXiv — cs.CVa day ago

Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding

PositiveArtificial Intelligence

Recent advancements in 3D scene-language understanding have led to the development of the 3D Spatial Language Instruction Mask (3D-SLIM), which enhances the reasoning capabilities of Large Language Models (LLMs) by replacing traditional causal attention masks with adaptive attention masks tailored to the spatial structures of 3D scenes. This innovation addresses key limitations in current methodologies, such as sequential bias and restricted attention in task-specific reasoning.

Read full article

via arXiv — cs.CV

InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration

arXiv — cs.CVa day ago

InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration

PositiveArtificial Intelligence

The introduction of InEx presents a novel approach to mitigating hallucinations in large language models (LLMs) by employing a training-free, multi-agent framework that incorporates introspective reasoning and cross-modal collaboration. This method aims to enhance the reliability of multimodal LLMs (MLLMs) by autonomously refining responses through iterative verification processes.

Read full article

via arXiv — cs.CV

Deep Research: A Systematic Survey

arXiv — cs.CLa day ago

Deep Research: A Systematic Survey

PositiveArtificial Intelligence

A systematic survey on Deep Research (DR) has been published, highlighting the evolution of large language models (LLMs) from mere text generators to sophisticated problem solvers. This survey outlines a three-stage roadmap for integrating LLMs with external tools, enabling them to tackle complex tasks that require critical thinking and multi-source verification.

Read full article

via arXiv — cs.CL

Towards Unification of Hallucination Detection and Fact Verification for Large Language Models

arXiv — cs.CLa day ago

Towards Unification of Hallucination Detection and Fact Verification for Large Language Models

PositiveArtificial Intelligence

A new framework named UniFact has been introduced to unify Hallucination Detection (HD) and Fact Verification (FV) for Large Language Models (LLMs), addressing the prevalent issue of LLMs generating factually incorrect content, known as hallucinations. This initiative aims to bridge the gap between two previously isolated research paradigms, enhancing the evaluation of LLM outputs.

Read full article

via arXiv — cs.CL

A benchmark dataset for evaluating Syndrome Differentiation and Treatment in large language models

arXiv — cs.CLa day ago

A benchmark dataset for evaluating Syndrome Differentiation and Treatment in large language models

PositiveArtificial Intelligence

A new benchmark dataset, TCM-BEST4SDT, has been proposed to evaluate the capabilities of Large Language Models (LLMs) in the context of Traditional Chinese Medicine (TCM), specifically focusing on Syndrome Differentiation and Treatment (SDT). This dataset aims to address the challenges posed by TCM's individualized and holistic nature, which current evaluation frameworks often overlook.

Read full article

via arXiv — cs.CL

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

arXiv — cs.CLa day ago

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

PositiveArtificial Intelligence

A new study introduces Stable Rank Group Relative Policy Optimization (SR-GRPO), which utilizes stable rank as an intrinsic quality signal for aligning Large Language Models (LLMs) with human preferences, addressing limitations of traditional methods that rely on external supervision. The stable rank measures the effective dimensionality of hidden states, achieving notable improvements in task accuracy.

Read full article

via arXiv — cs.CL