Pet-Bench: Benchmarking the Abilities of Large Language Models as E-Pets in Social Network Services

arXiv — cs.CL•Monday, December 8, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new benchmark called Pet-Bench has been introduced to evaluate the capabilities of Large Language Models (LLMs) as virtual pets in social network services. This benchmark assesses both self-interaction and human interaction, emphasizing self-evolution and developmental behaviors, which are crucial for simulating realistic pet companionship. The evaluation includes over 7,500 interaction instances designed to reflect diverse pet behaviors.
The development of Pet-Bench is significant as it addresses the gap in existing research that primarily focuses on basic pet role-playing interactions. By systematically benchmarking LLMs for comprehensive companionship, it aims to enhance user experiences in virtual environments, potentially leading to more engaging and emotionally rich interactions with AI.
This advancement in LLM evaluation highlights ongoing discussions about the effectiveness and emotional depth of AI companions. While some studies reveal limitations in LLM-generated personas, particularly in low-resource settings, others emphasize the transformative potential of LLMs across various applications, including academic disciplines and emotional expression. The contrasting findings underscore the need for robust evaluation frameworks to ensure equitable and effective AI interactions.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

One More Thing in AI

Master AI with curated tools and tutorials for practical, real-world applications.

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

Chattermate

Build and deploy AI support agents without writing any code.

AI & DataView app details

DrawMy.Pet

Generate custom AI portraits of your pet in various artistic styles.

AI & DataView app details

Ai doll

Create your perfect AI companion with customizable personality and appearance.

AI & DataView app details

ModelsLab

Access over 100,000 AI models through a unified API platform.

Business & ProductivityView app details

Continue Readings

arXiv — cs.CL2 days ago

Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation

NeutralArtificial Intelligence

The recent development in financial compliance checking involves the introduction of Compliance-to-Code, which leverages Regulatory Technology and Large Language Models to automate the conversion of complex regulatory text into executable compliance logic. This innovation aims to address the challenges posed by intricate financial regulations, particularly in the context of Chinese-language regulations, where existing models have shown suboptimal performance due to various limitations.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models

NeutralArtificial Intelligence

The introduction of QuantEval marks a significant advancement in evaluating Large Language Models (LLMs) in financial quantitative tasks, focusing on knowledge-based question answering, mathematical reasoning, and strategy coding. This benchmark incorporates a backtesting framework that assesses the performance of model-generated strategies using financial metrics, providing a more realistic evaluation of LLM capabilities.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

Focus, Merge, Rank: Improved Question Answering Based on Semi-structured Knowledge Bases

PositiveArtificial Intelligence

A new framework named FocusedRetriever has been introduced to enhance multi-hop question answering by leveraging Semi-Structured Knowledge Bases (SKBs), which connect unstructured content to structured data. This innovative approach integrates various components, including VSS-based entity search and LLM-based query generation, outperforming existing methods in the STaRK benchmark tests.

Read full article

via arXiv — cs.CL

arXiv — cs.CV2 days ago

Improving Zero-shot ADL Recognition with Large Language Models through Event-based Context and Confidence

PositiveArtificial Intelligence

A recent study has proposed enhancements to zero-shot recognition of Activities of Daily Living (ADLs) using Large Language Models (LLMs) by implementing event-based segmentation and a novel method for estimating prediction confidence. This approach aims to improve the accuracy of sensor-based recognition systems in smart homes, which are crucial for applications in healthcare and safety management.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Reasoning Matters for 3D Visual Grounding

PositiveArtificial Intelligence

Recent advancements in Large Language Models (LLMs) have highlighted the importance of reasoning in 3D visual grounding, a task that remains challenging due to the limitations of current models. The proposed 3D visual grounding data pipeline aims to synthesize data automatically, enhancing the ability to predict referring objects in 3D environments.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Detecting High-Stakes Interactions with Activation Probes

NeutralArtificial Intelligence

A recent study published on arXiv explores the use of activation probes to detect high-stakes interactions in Large Language Models (LLMs), focusing on interactions that may lead to significant harm. The research evaluates various probe architectures trained on synthetic data, demonstrating their robust generalization to real-world scenarios and highlighting their computational efficiency compared to traditional monitoring methods.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Synergy over Discrepancy: A Partition-Based Approach to Multi-Domain LLM Fine-Tuning

PositiveArtificial Intelligence

A new study presents a partition-based multi-stage fine-tuning framework for large language models (LLMs) aimed at enhancing their adaptability across diverse domains while minimizing inter-domain interference. This approach strategically organizes domains into subsets to leverage synergies and address discrepancies. The framework is supported by theoretical analysis and empirical evaluations demonstrating its superiority over existing methods in language understanding tasks.

Read full article

via arXiv — cs.LG

arXiv — cs.CL2 days ago

Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs

NeutralArtificial Intelligence

A recent study introduced ValAct-15k, a dataset comprising 3,000 advice-seeking scenarios from Reddit, aimed at evaluating how Large Language Models (LLMs) represent and enact human values based on Schwartz Theory of Basic Human Values. The study assessed ten frontier LLMs from both U.S. and Chinese companies, revealing a significant knowledge-action gap where both LLMs and human participants exhibited weak correspondence between self-reported and enacted values.

Read full article

via arXiv — cs.CL

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about