SoMe: A Realistic Benchmark for LLM-based Social Media Agents

arXiv — cs.CL•Thursday, December 18, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

A new benchmark called SoMe has been introduced to evaluate large language model (LLM)-based social media agents, addressing the need for comprehensive assessment of their capabilities in understanding media content and user behavior. SoMe includes 8 tasks, over 9 million posts, and nearly 7,000 user profiles, making it a significant resource for researchers and developers in the field of AI and social media.
This development is crucial as it provides a structured framework for evaluating LLMs in social media contexts, which have become increasingly influential in shaping public discourse and user interactions. By offering a realistic benchmark, SoMe aims to enhance the reliability and effectiveness of LLMs in these environments.
The introduction of SoMe reflects ongoing discussions about the role of LLMs in critical applications, including safety and ethical considerations. As LLMs are integrated into various sectors, concerns about their memorization of training data and potential biases are becoming more prominent, highlighting the need for robust evaluation metrics and frameworks to ensure responsible AI deployment.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

One More Thing in AI

Master AI with curated tools and tutorials for practical, real-world applications.

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

Chattermate

Build and deploy AI support agents without writing any code.

AI & DataView app details

Supametas.AI

Extract and structure unstructured data for seamless LLM RAG integration.

AI & DataView app details

Linkedmash

Turn your LinkedIn posts into actionable insights for creators and professionals.

Marketing & CommerceView app details

Sellm

Track brand mentions across ChatGPT, Perplexity, and other AI platforms.

Marketing & CommerceView app details

Continue Readings

Ars Technica — Alla day ago

LLMs’ impact on science: Booming publications, stagnating quality

NegativeArtificial Intelligence

Recent studies indicate that the rise of large language models (LLMs) has led to an increase in the number of published research papers, yet the quality of these publications remains stagnant. Researchers are increasingly relying on LLMs for their work, which raises concerns about the depth and rigor of scientific inquiry.

Read full article

via Ars Technica — All

arXiv — cs.LG2 days ago

3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model

PositiveArtificial Intelligence

The introduction of 3DLLM-Mem marks a significant advancement in the capabilities of Large Language Models (LLMs) by integrating long-term spatial-temporal memory for enhanced reasoning in dynamic 3D environments. This model is evaluated using the 3DMem-Bench, which includes over 26,000 trajectories and 2,892 tasks designed to test memory utilization in complex scenarios.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

RecTok: Reconstruction Distillation along Rectified Flow

PositiveArtificial Intelligence

RecTok has been introduced as a novel approach to enhance high-dimensional visual tokenizers in diffusion models, addressing the inherent trade-off between dimensionality and generation quality. By employing flow semantic distillation and reconstruction-alignment distillation, RecTok aims to improve the semantic richness of the forward flow used in training diffusion transformers.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Event Camera Meets Mobile Embodied Perception: Abstraction, Algorithm, Acceleration, Application

NeutralArtificial Intelligence

A comprehensive survey has been conducted on event-based mobile sensing, highlighting its evolution from 2014 to 2025. The study emphasizes the challenges posed by high data volume, noise, and the need for low-latency processing in mobile applications, particularly in the context of event cameras that offer high temporal resolution.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

How a Bit Becomes a Story: Semantic Steering via Differentiable Fault Injection

NeutralArtificial Intelligence

A recent study published on arXiv explores how low-level bitwise perturbations, or fault injections, in large language models (LLMs) can affect the semantic meaning of generated image captions while maintaining grammatical integrity. This research highlights the vulnerability of transformers to subtle hardware bit flips, which can significantly alter the narratives produced by AI systems.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Inference Time Feature Injection: A Lightweight Approach for Real-Time Recommendation Freshness

PositiveArtificial Intelligence

A new approach called Inference Time Feature Injection has been introduced to enhance real-time recommendation systems in long-form video streaming. This method allows for the selective injection of recent user watch history at inference time, overcoming the limitations of static user features that are updated only daily. The technique has shown a statistically significant increase in user engagement metrics by 0.47%.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

PositiveArtificial Intelligence

A novel framework named INFORM-CT has been proposed to enhance the management of incidental findings in abdominal CT scans by integrating large language models (LLMs) and vision-language models (VLMs). This approach automates the detection, classification, and reporting processes, significantly improving efficiency compared to traditional manual inspections by radiologists.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Low-rank MMSE filters, Kronecker-product representation, and regularization: a new perspective

PositiveArtificial Intelligence

A new method has been proposed for efficiently determining the regularization parameter for low-rank MMSE filters using a Kronecker-product representation. This approach highlights the importance of selecting the correct regularization parameter, which is closely tied to rank selection, and demonstrates significant improvements over traditional methods through simulations.

Read full article

via arXiv — cs.LG

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about