arXiv:2511.08592v1 Announce Type: new 
Abstract: Large Language Models (LLMs) offer new avenues to simulate online communities and social media. Potential applications range from testing the design of content recommendation algorithms to estimating the effects of content policies and interventions. However, the validity of using LLMs to simulate conversations between various users remains largely untested. We evaluated whether LLMs can convincingly mimic human group conversations on social media. We collected authentic human conversations from Reddit and generated artificial conversations on the same topic with two LLMs: Llama 3 70B and GPT-4o. When presented side-by-side to study participants, LLM-generated conversations were mistaken for human-created content 39\% of the time. In particular, when evaluating conversations generated by Llama 3, participants correctly identified them as AI-generated only 56\% of the time, barely better than random chance. Our study demonstrates that LLMs can generate social media conversations sufficiently realistic to deceive humans when reading them, highlighting both a promising potential for social simulation and a warning message about the potential misuse of LLMs to generate new inauthentic social media content.

أظهرت دراسة حديثة قدرة نماذج اللغة الكبيرة (LLMs) على محاكاة مناقشات واقعية متعددة المستخدمين على وسائل التواصل الاجتماعي. باستخدام محادثات من Reddit، وجدت الدراسة أن المحادثات التي تم إنشاؤها بواسطة LLMs تم الخلط بينها وبين المحتوى البشري بنسبة 39%، حيث تم التعرف على Llama 3 على أنه تم إنشاؤه بواسطة الذكاء الاصطناعي فقط بنسبة 56%. تسلط هذه الدراسة الضوء على كل من الإمكانيات لمحاكاة اجتماعية والمخاطر المحتملة للاستخدام غير الصحيح في إنشاء محتوى غير أصيل.

Un estudio reciente evaluó la capacidad de los grandes modelos de lenguaje (LLMs) para simular discusiones realistas entre múltiples usuarios en redes sociales. Realizado con conversaciones de Reddit, el estudio encontró que las conversaciones generadas por LLMs fueron confundidas con contenido humano el 39% de las veces, siendo Llama 3 identificado como generado por IA solo el 56% de las veces. Esta investigación destaca tanto el potencial para la simulación social como los riesgos de uso indebido en la generación de contenido inauténtico.

Une étude récente a évalué la capacité des grands modèles de langage (LLMs) à simuler des discussions réalistes entre plusieurs utilisateurs sur les réseaux sociaux. Réalisée à partir de conversations sur Reddit, l'étude a révélé que les dialogues générés par les LLMs étaient confondus avec du contenu humain dans 39 % des cas, Llama 3 n'étant identifié comme généré par une IA que dans 56 % des cas. Cette recherche souligne à la fois le potentiel de simulation sociale et les risques d'utilisation abusive pour générer du contenu inauthentique.

A recent study evaluated the ability of large language models (LLMs) to simulate realistic multi-user discussions on social media. Conducted using conversations from Reddit, the study found that LLM-generated dialogues were mistaken for human content 39% of the time, with Llama 3 being identified as AI-generated only 56% of the time. This research highlights both the potential for social simulation and the risks of misuse in generating inauthentic content.

The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions

One More Thing in AI – Your Shortcut to AI Mastery

The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions

Was this article worth reading? Share it

One More Thing in AI

LucidQuery AI

Chattermate

Sellm

ChatOne

TELEGAI

Ready to build your own newsroom?