arXiv:2510.17013v2 Announce Type: replace 
Abstract: Recent LLM benchmarks have tested models on a range of phenomena, but are still focused primarily on natural language understanding for extraction of explicit information, such as QA or summarization, with responses often tar- geting information from individual sentences. We are still lacking more challenging, and im- portantly also multilingual, benchmarks focus- ing on implicit information and pragmatic infer- ences across larger documents in the context of discourse tracking: integrating and aggregating information across sentences, paragraphs and multiple speaker utterances. To this end, we present DiscoTrack, an LLM benchmark target- ing a range of tasks across 12 languages and four levels of discourse understanding: salience recognition, entity tracking, discourse relations and bridging inference. Our evaluation shows that these tasks remain challenging, even for state-of-the-art models.

DiscoTrack هو معيار متعدد اللغات جديد مصمم لتحسين تتبع الخطاب في نماذج اللغة. على عكس المعايير السابقة التي كانت تركز بشكل أساسي على استخراج المعلومات الصريحة، يبرز DiscoTrack أهمية فهم المعلومات الضمنية والاستدلالات العملية عبر نصوص أكبر، مما يمثل خطوة مهمة إلى الأمام في هذا المجال.

DiscoTrack es un nuevo benchmark multilingüe diseñado para mejorar el seguimiento del discurso en modelos de lenguaje. A diferencia de los benchmarks anteriores que se centraban principalmente en la extracción de información explícita, DiscoTrack enfatiza la importancia de comprender la información implícita y las inferencias pragmáticas a través de textos más largos, lo que representa un avance significativo en el campo.

DiscoTrack est un nouveau benchmark multilingue conçu pour améliorer le suivi du discours dans les modèles de langage. Contrairement aux benchmarks précédents qui se concentraient principalement sur l'extraction d'informations explicites, DiscoTrack met l'accent sur l'importance de comprendre les informations implicites et les inférences pragmatiques à travers des textes plus longs, ce qui représente une avancée significative dans le domaine.

DiscoTrack is a new multilingual benchmark designed to enhance discourse tracking in language models. Unlike previous benchmarks that mainly focus on explicit information extraction, DiscoTrack emphasizes the importance of understanding implicit information and pragmatic inferences across larger texts, making it a significant step forward in the field.

DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking

Was this article worth reading? Share it

Ready to build your own newsroom?