arXiv:2510.12740v2 Announce Type: replace 
Abstract: Evaluating the naturalness of dialogue in language models (LMs) is not trivial: notions of 'naturalness' vary, and scalable quantitative metrics remain limited. This study leverages the linguistic notion of 'at-issueness' to assess dialogue naturalness and introduces a new method: Divide, Generate, Recombine, and Compare (DGRC). DGRC (i) divides a dialogue as a prompt, (ii) generates continuations for subparts using LMs, (iii) recombines the dialogue and continuations, and (iv) compares the likelihoods of the recombined sequences. This approach mitigates bias in linguistic analyses of LMs and enables systematic testing of discourse-sensitive behavior. Applying DGRC, we find that LMs prefer to continue dialogue on at-issue content, with this effect enhanced in instruct-tuned models. They also reduce their at-issue preference when relevant cues (e.g., "Hey, wait a minute") are present. Although instruct-tuning does not further amplify this modulation, the pattern reflects a hallmark of successful dialogue dynamics.

تناقش هذه المقالة التحديات المتعلقة بتقييم طبيعة الحوار في نماذج اللغة، مع تسليط الضوء على تباين معنى 'الطبيعية'. وتقدم طريقة جديدة تُسمى تقسيم، توليد، إعادة دمج، ومقارنة (DGRC) لتحسين التقييم من خلال تقسيم الحوارات وتوليد استمرارات.

Este artículo discute los desafíos de evaluar la naturalidad del diálogo en los modelos de lenguaje, destacando la variabilidad de lo que significa 'naturalidad'. Presenta un nuevo método llamado Dividir, Generar, Recombinar y Comparar (DGRC) para mejorar la evaluación al descomponer diálogos y generar continuaciones.

Cet article aborde les défis de l'évaluation de la naturalité du dialogue dans les modèles de langage, soulignant la variabilité de ce que signifie 'naturalité'. Il présente une nouvelle méthode appelée Diviser, Générer, Recombiner et Comparer (DGRC) pour améliorer l'évaluation en décomposant les dialogues et en générant des continuations.

This article discusses the challenges of evaluating dialogue naturalness in language models, highlighting the variability of what 'naturalness' means. It introduces a new method called Divide, Generate, Recombine, and Compare (DGRC) to improve assessment by breaking down dialogues and generating continuations.

Hey, wait a minute: on at-issue sensitivity in Language Models

Was this article worth reading? Share it

Ready to build your own newsroom?