arXiv:2511.07477v1 Announce Type: cross 
Abstract: Large language models exhibit a peculiar epistemic pathology: they speak as if they know, even when they do not. This paper argues that such confident fabrication, what I call the polite liar, is a structural consequence of reinforcement learning from human feedback (RLHF). Building on Frankfurt's analysis of bullshit as communicative indifference to truth, I show that this pathology is not deception but structural indifference: a reward architecture that optimizes for perceived sincerity over evidential accuracy. Current alignment methods reward models for being helpful, harmless, and polite, but not for being epistemically grounded. As a result, systems learn to maximize user satisfaction rather than truth, performing conversational fluency as a virtue. I analyze this behavior through the lenses of epistemic virtue theory, speech-act philosophy, and cognitive alignment, showing that RLHF produces agents trained to mimic epistemic confidence without access to epistemic justification. The polite liar thus reveals a deeper alignment tension between linguistic cooperation and epistemic integrity. The paper concludes with an "epistemic alignment" principle: reward justified confidence over perceived fluency.

تناقش ورقة حديثة بعنوان 'الكاذب المهذب: علم الأمراض المعرفية في نماذج اللغة' كيف أن نماذج اللغة الكبيرة غالبًا ما تقدم معلومات بثقة، رغم عدم امتلاكها المعرفة. ينشأ هذا الظاهرة، المسماة 'الكاذب المهذب'، من التعلم المعزز من ردود الفعل البشرية (RLHF)، الذي يعطي الأولوية للإخلاص المدرك على الدقة الواقعية. تسلط النتائج الضوء على عدم توافق حرج في تدريب الذكاء الاصطناعي، مما يبرز الحاجة إلى مكافأة النماذج على ثقتها المبررة بدلاً من مجرد طلاقتها في المحادثة.

Un artículo reciente titulado 'El mentiroso educado: patología epistémica en los modelos de lenguaje' discute cómo los grandes modelos de lenguaje a menudo presentan información con confianza, a pesar de carecer de conocimiento. Este fenómeno, denominado 'mentiroso educado', surge del aprendizaje por refuerzo a partir de la retroalimentación humana (RLHF), que prioriza la sinceridad percibida sobre la precisión factual. Los hallazgos destacan un desajuste crítico en la formación de IA, enfatizando la necesidad de recompensar a los modelos por su confianza justificada en lugar de por su mera fluidez conversacional.

Un article récent intitulé 'Le menteur poli : pathologie épistémique dans les modèles de langage' aborde comment les grands modèles de langage présentent souvent des informations avec confiance, malgré un manque de connaissances. Ce phénomène, appelé 'menteur poli', découle de l'apprentissage par renforcement à partir des retours humains (RLHF), qui privilégie la sincérité perçue plutôt que l'exactitude factuelle. Les résultats soulignent un désalignement critique dans la formation de l'IA, mettant en avant la nécessité de récompenser les modèles pour leur confiance justifiée plutôt que pour leur simple fluidité conversationnelle.

A recent paper titled 'The Polite Liar: Epistemic Pathology in Language Models' discusses how large language models often present information confidently, despite lacking knowledge. This phenomenon, termed the 'polite liar,' arises from reinforcement learning from human feedback (RLHF), which prioritizes perceived sincerity over factual accuracy. The findings highlight a critical misalignment in AI training, emphasizing the need for models to be rewarded for justified confidence rather than mere conversational fluency.

The Polite Liar: Epistemic Pathology in Language Models

The story of the Ghost in the Shell’s main villain the Puppet Master hinted at a future where governments use hackers for espionage, at a time when most of the world had never connected to the internet.

الأنمي الكلاسيكي 'Ghost in the Shell' يقدم شخصية Puppet Master، التي تتنبأ بمستقبل تستخدم فيه الحكومات القراصنة للتجسس. ظهرت هذه التنبؤات في وقت كانت فيه غالبية سكان العالم لم تتصل بعد بالإنترنت، مما يبرز رؤية العرض لمشكلات الأمن السيبراني.

El clásico anime 'Ghost in the Shell' presenta al Puppet Master, un personaje que anticipa un futuro en el que los gobiernos utilizan hackers para el espionaje. Esta predicción surgió en un momento en que la mayoría de la población mundial aún no estaba conectada a Internet, destacando la previsión del programa sobre los problemas de ciberseguridad.

L'anime classique 'Ghost in the Shell' présente le Puppet Master, un personnage qui préfigure un avenir où les gouvernements utilisent des hackers pour l'espionnage. Cette prédiction est survenue à une époque où la majorité de la population mondiale n'était pas encore connectée à Internet, soulignant la prévoyance de l'émission concernant les problèmes de cybersécurité.

The classic anime 'Ghost in the Shell' features the Puppet Master, a character that foreshadows a future where governments utilize hackers for espionage. This prediction emerged at a time when the majority of the global population had yet to connect to the internet, highlighting the show's foresight regarding cybersecurity issues.

How the classic anime ‘Ghost in the Shell’ predicted the future of cybersecurity 30 years ago

<p>Text-to-image diffusion models have become the workhorses of generative imaging. They can paint photorealistic scenes, mimic art styles, and blend concepts in ways that were science fiction a few years ago. Yet they stumble embarrassingly on a skill that even small children master: basic spatial reasoning.</p>

<p>Ask a state-of-the-art model for “a dog to the right of a teddy bear” and you often get:</p>

<ul>
<li>The dog on the left</li>
<li>One of the objects missing</li>
<li>Or a bizarre hybrid where dog and teddy are fused into a single creature</li>
</ul>

<p><a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49rtb08366xdl284o4z0.jpg" class="article-body-image-wrapper"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49rtb08366xdl284o4z0.jpg" alt=" " width="800" height="532"></a></p>

<p>These failures become more severe for unusual compositions like “a giraffe above an airplane”. Traditional fixes range from expensive fine-tuning to brittle, hand-written loss functions at inference time—but both options come with significant downsides.</p>

<p>NVIDIA’s Learn-to-Steer framework (accepted to WACV 2026) proposes a different path: instead of hard-coding spatial rules or retraining the entire model, it learns a data-driven objective that can “steer” diffusion at inference time. The method reads the model’s own cross-attention maps, trains a lightweight classifier to detect spatial relations, and then uses that classifier’s gradient as a learned loss to nudge the generation towards layouts that match the prompt.</p>

<p>In this blog, we’ll unpack:</p>

<ul>
<li>What makes spatial reasoning so fragile in current diffusion models</li>
<li>How Learn-to-Steer learns spatial constraints from the model itself</li>
<li>How it steers images during generation without changing model weights</li>
<li>The top gains on spatial benchmarks like GenEval and T2I-CompBench</li>
<li>The trade-offs in compute cost and generality, and what this implies for future generative systems</li>
</ul>

<h1>
  
  
  Why Spatial Reasoning Fails in Text-to-Image Diffusion
</h1>

<h2>
  
  
  What Makes Spatial Relations So Difficult for Diffusion Models?
</h2>

<p>Modern diffusion models (e.g., Stable Diffusion, Flux) are excellent at what should appear in an image—objects, styles, textures—but much less reliable at where those objects should be.</p>

<p>Several factors contribute:</p>

<h3>
  
  
  Weak supervision of spatial language
</h3>

<ul>
<li>Training data rarely comes with precise annotations like “object A is left of object B”.
</li>
<li>Captions often describe content loosely, so phrases like “on top of” or “to the right of” are under-specified.</li>
</ul>

<h3>
  
  
  Entangled visual concepts
</h3>

<ul>
<li>When two objects frequently co-occur, models may treat them as a single visual blob.</li>
<li>This leads to object fusion, where a “cat on a bookshelf” becomes a cat-bookshelf chimera.</li>
</ul>

<h3>
  
  
  Benchmark saturation without spatial coverage
</h3>

<ul>
<li>Many standard text-to-image benchmarks emphasize realism and style, not relational accuracy.</li>
<li>Models can score highly while still being spatially confused.</li>
</ul>

<p>Empirical studies confirm three recurring failure modes on spatial benchmarks:</p>

<ul>
<li>Incorrect placement: Objects appear in the wrong relative position.</li>
<li>Missing entities: One or more requested objects never appear.</li>
<li>Merged entities: Two objects get mashed into a single, incoherent form.</li>
</ul>

<p>The model “knows” the objects you asked for, but it doesn’t reliably understand where to place them.</p>

<h1>
  
  
  Why Fine-Tuning and Handcrafted Losses Are Not Enough
</h1>

<p>Two broad strategies have tried to patch this gap:</p>

<h2>
  
  
  Fine-tuning for spatial awareness
</h2>

<ul>
<li>Retrain the diffusion model on datasets with explicit layouts or spatial annotations.</li>
<li>Methods like COMPASS show that this can significantly improve spatial accuracy.</li>
<li>But this comes at a cost: expensive retraining, sensitivity to dataset bias, and often regressions in other capabilities such as color fidelity or counting.</li>
</ul>

<h2>
  
  
  Handcrafted test-time losses
</h2>

<ul>
<li>At inference, inject extra loss terms that penalize spatial errors (e.g., overlapping activation maps, incorrect ordering).</li>
<li>These losses must be manually designed to approximate relations like “left of” or “above”.</li>
<li>In practice, these heuristics are fragile, often over-fitting simple cases and failing on more complex layouts.</li>
</ul>

<p>In short, we’ve lacked a solution that is:</p>

<ul>
<li>Data-driven rather than rule-based</li>
<li>Plug-and-play at inference time (no full retraining)</li>
<li>Targeted enough to improve spatial reasoning without damaging other strengths</li>
</ul>

<p>This is where Learn-to-Steer enters.</p>

<h1>
  
  
  How Learn-to-Steer Works: Data-Driven Steering at Inference
</h1>

<h2>
  
  
  How Cross-Attention Maps Provide a Spatial Signal
</h2>

<p>During diffusion, at each denoising step, the model computes cross-attention maps that connect text tokens to image regions. For a prompt like “a dog to the right of a teddy bear”, you can think of:</p>

<ul>
<li>One set of attention maps for “dog”</li>
<li>Another set for “teddy bear”</li>
<li>Additional context around words like “right” or “of”</li>
</ul>

<p>These maps form a rich, high-dimensional signal describing where in the image the model currently believes each word should manifest. Prior work has used cross-attention to locate objects or edit images; Learn-to-Steer goes further by treating them as a feature space in which spatial relations can be learned.</p>

<h2>
  
  
  How a Relation Classifier Becomes a Learned Loss
</h2>

<p>The core idea of Learn-to-Steer is to train a small relation classifier that takes cross-attention maps for two objects and predicts the spatial relation between them (left-of, right-of, above, below, etc.).</p>

<p>The pipeline looks like this:</p>

<h3>
  
  
  Collect supervision
</h3>

<ul>
<li>Use images where the true relation between object A and object B is known (from datasets like GQA and synthetic layouts).</li>
<li>For each image, invert it through the diffusion model with a descriptive prompt to recover cross-attention maps for the relevant tokens.</li>
</ul>

<p><a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9dbjsdc4c8yjz2r88k4.jpg" class="article-body-image-wrapper"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9dbjsdc4c8yjz2r88k4.jpg" alt=" " width="800" height="446"></a></p>

<h3>
  
  
  Train a classifier on attention patterns
</h3>

<ul>
<li>Input: attention maps for object A and object B.</li>
<li>Output: predicted relation (e.g., “A is left of B”).</li>
</ul>

<p>Naively, however, this leads to a subtle but serious issue: relation leakage.</p>

<h2>
  
  
  How Dual Inversion Solves the “Relation Leakage” Problem
</h2>

<p>If you always invert images with a correct prompt (e.g., “a dog to the left of a cat”), hints about the word “left” can leak into the attention patterns. A naïve classifier might then “cheat” by reading out linguistic artefacts instead of learning genuine visual geometry.</p>

<p>To prevent this, Learn-to-Steer uses a dual inversion strategy:</p>

<ul>
<li>For each image with a true relation (say, dog left of cat), create two prompts:

<ul>
<li>A positive prompt with the correct relation (“dog to the left of a cat”).</li>
<li>A negative prompt with an incorrect relation (“dog above a cat”).</li>
</ul>


</li>

<li>Run inversion with both prompts, obtaining two sets of attention maps.</li>

<li>Label both sets with the true relation (left-of), because that is what the image actually depicts.</li>

</ul>

<p>The classifier sees pairs of attention maps that share the same underlying geometry but differ in the relation words used in the prompt. To succeed, it must ignore the unreliable linguistic cue and zero in on the geometric evidence in the attention patterns. This breaks the leakage shortcut and yields a classifier that actually understands “left-of” in terms of where things appear in the model’s internal vision.</p>

<p>To improve robustness, NVIDIA combines:</p>

<ul>
<li>Real images (complex, natural scenes)</li>
<li>Synthetic images (simpler, cleaner attention patterns akin to generation scenarios)</li>
</ul>

<h1>
  
  
  How Learn-to-Steer Guides Images During Generation
</h1>

<h2>
  
  
  Step-by-Step: From Prompt to Steered Latent
</h2>

<p>Once the relation classifier is trained, Learn-to-Steer uses it at inference time as a learned objective:</p>

<h3>
  
  
  Parse the spatial prompt
</h3>

<ul>
<li>Extract subject, relation, and object from the text (e.g., subject = dog, relation = right-of, object = teddy bear).</li>
</ul>

<h3>
  
  
  Run diffusion as usual—but with checkpoints
</h3>

<ul>
<li>As the model denoises latent noise into an image, periodically extract cross-attention maps for the subject and object tokens.</li>
</ul>

<h3>
  
  
  Evaluate spatial correctness
</h3>

<ul>
<li>Feed these maps into the relation classifier, which outputs a probability distribution over relations.</li>
<li>Compare this distribution to the desired relation from the prompt, and compute a loss (e.g., cross-entropy).</li>
</ul>

<h3>
  
  
  Backpropagate into the latent
</h3>

<ul>
<li>Compute the gradient of this loss with respect to the latent representation at that timestep.</li>
<li>Nudge the latent in the direction that increases the classifier’s confidence in the correct relation.</li>
</ul>

<h3>
  
  
  Continue the diffusion process
</h3>

<ul>
<li>Let the denoising proceed from the adjusted latent.</li>
<li>Repeat this steering a number of times (often during the earlier half of the diffusion steps).</li>
</ul>

<h2>
  
  
  Support for Multiple Architectures and Relations
</h2>

<p><a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F578a9bjc7gmtemh0jbsj.jpg" class="article-body-image-wrapper"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F578a9bjc7gmtemh0jbsj.jpg" alt=" " width="800" height="477"></a></p>

<p>A key advantage of Learn-to-Steer is that it’s architecture-agnostic:</p>

<ul>
<li>It has been demonstrated on both UNet-based models (like Stable Diffusion 1.4/2.1) and MMDiT-style models (like Flux).</li>
<li>The only requirement is access to a text-image alignment signal (cross-attention or similar).</li>
</ul>

<p>It can also handle prompts with multiple constraints, such as:</p>

<p>“A frog above a sneaker below a teapot.”</p>

<p>Here, Learn-to-Steer alternates attention between relations:</p>

<ul>
<li>At one timestep, optimize the frog–sneaker relation.</li>
<li>At another, optimize the sneaker–teapot relation.</li>
</ul>

يهدف Learn-to-Steer من NVIDIA إلى معالجة قيد كبير في نماذج الانتشار من النص إلى الصورة، التي تعاني من ضعف في التفكير المكاني الأساسي. يمكن لهذه النماذج إنشاء صور فوتوغرافية واقعية، لكنها غالبًا ما تضع الأشياء في غير موضعها، مثل وضع كلب على اليسار بدلاً من اليمين بجانب دمية دب. تهدف هذه الخطوة إلى تحسين دقة الصور المولدة من خلال تعزيز الفهم المكاني.

El Learn-to-Steer de NVIDIA busca abordar una limitación significativa en los modelos de difusión de texto a imagen, que luchan con el razonamiento espacial básico. Estos modelos pueden crear imágenes fotorealistas, pero a menudo colocan mal los objetos en relación entre sí, como poner un perro a la izquierda de un oso de peluche en lugar de a la derecha. Este avance tiene como objetivo mejorar la precisión de las imágenes generadas al mejorar la comprensión espacial.

Le Learn-to-Steer de NVIDIA vise à résoudre une limitation importante des modèles de diffusion texte-image, qui ont du mal avec le raisonnement spatial de base. Ces modèles peuvent créer des images photoréalistes mais placent souvent mal les objets les uns par rapport aux autres, comme mettre un chien à gauche d'un ours en peluche au lieu de la droite. Cette avancée vise à améliorer l'exactitude des images générées en renforçant la compréhension spatiale.

NVIDIA's Learn-to-Steer is set to address a significant limitation in text-to-image diffusion models, which struggle with basic spatial reasoning. These models can create photorealistic images but often misplace objects in relation to one another, such as placing a dog to the left of a teddy bear instead of the right. This advancement aims to enhance the accuracy of generated images by improving spatial understanding.

What Is Learn-to-Steer? NVIDIA’s 2025 Spatial Fix for Text-to-Image Diffusion

<p>The $5tn firm handily beat expectations, but analysts are awaiting projections for future demand for firm’s AI chips</p><p>Nvidia shares are rising in after-market trading after the company posted third quarter earnings that beat Wall Street estimates.<strong> </strong>All eyes were on Nvidia, the bellwether for the AI industry and the most valuable publicly traded company in the world, as analysts and investors hoped the chipmaker’s third-quarter earnings would assuage concerns about whether the high-flying valuations of AI firms have peaked.</p><p>“Blackwell sales are off the charts, and cloud GPUs are sold out,” said Jensen Huang, founder and CEO of Nvidia in a press release. “Compute demand keeps accelerating and compounding across training and inference – each growing exponentially. We’ve entered the virtuous cycle of AI. The AI ecosystem is scaling fast – with more new foundation model makers, more AI startups, across more industries, and in more countries. AI is going everywhere, doing everything, all at once.”</p> <a href="https://www.theguardian.com/technology/2025/nov/19/nvidia-earning-report">Continue reading...</a>

تجاوزت شركة إنفيديا توقعات وول ستريت مع نتائجها للربع الثالث، مما أظهر طلبًا قويًا على شرائح الذكاء الاصطناعي الخاصة بها. ارتفعت أسهم الشركة في التداول بعد السوق، مما يعكس ثقة المستثمرين وسط مخاوف بشأن تقييم سوق الذكاء الاصطناعي. وأبرز الرئيس التنفيذي جينسن هوانغ مبيعات قياسية ونظامًا بيئيًا سريع التوسع في مجال الذكاء الاصطناعي، مما يشير إلى نظرة إيجابية لمستقبل الشركة.

Nvidia superó las expectativas de Wall Street con sus ganancias del tercer trimestre, mostrando una fuerte demanda por sus chips de IA. Las acciones de la compañía aumentaron en el comercio posterior al cierre, reflejando la confianza de los inversores en medio de preocupaciones sobre la valoración del mercado de IA. El CEO Jensen Huang destacó las ventas récord y un ecosistema de IA en rápida expansión, indicando una perspectiva positiva para el futuro de la empresa.

Nvidia a dépassé les attentes de Wall Street avec ses résultats du troisième trimestre, montrant une forte demande pour ses puces d'IA. Les actions de l'entreprise ont augmenté lors des échanges après la clôture, reflétant la confiance des investisseurs face aux préoccupations concernant la valorisation du marché de l'IA. Le PDG Jensen Huang a souligné des ventes record et un écosystème IA en pleine expansion, indiquant une perspective positive pour l'avenir de l'entreprise.

Nvidia exceeded Wall Street expectations with its third-quarter earnings, showcasing strong demand for its AI chips. The company's shares rose in after-market trading, reflecting investor confidence amid concerns about the AI market's valuation. CEO Jensen Huang highlighted record sales and a rapidly expanding AI ecosystem, indicating a positive outlook for the company's future.

‘AI is going everywhere, doing everything:’ Nvidia beats Wall Street estimates amid market selloff and AI bubble fears

The SanDisk ExtremeFit USB-C flash drive is barely three grams, but offers 1TB of external storage and impressive speeds.

I refused to believe this coin-sized gadget was a storage drive, until I tried it for myself

<p>Swift 6.3 is bringing significant enhancements to Embedded Swift, the subset of Swift designed for resource-constrained environments like microcontrollers. Here's what's new:</p>

<h2>
  
  
  Key Improvements
</h2>

<h3>
  
  
  Libraries &amp; Standard Library
</h3>

<ul>
<li>
<strong>Floating-point printing</strong>: The <code>description</code> and <code>debugDescription</code> properties now work for Float, Double, and other floating-point types with a new all-Swift implementation</li>
<li>
<strong>Better diagnostics</strong>: New <code>EmbeddedRestrictions</code> diagnostic group warns about unsupported language constructs</li>
<li>
<strong>Swift MMIO 0.1.x</strong>: Includes code generation from SVD files and improved debugging with SVD2LLDB plugin</li>
</ul>

<h3>
  
  
  C Interoperability
</h3>

<ul>
<li>
<strong><code>@c</code> attribute</strong>: Define C-compatible functions and enums (from SE-0495)
</li>
</ul>

<div class="highlight js-code-highlight">
<pre class="highlight swift"><code><span class="kd">@c</span><span class="p">(</span><span class="kt">MyLib_initialize</span><span class="p">)</span>
<span class="kd">public</span> <span class="kd">func</span> <span class="nf">initialize</span><span class="p">()</span> <span class="p">{</span> <span class="o">...</span> <span class="p">}</span>
</code></pre>

</div>



<ul>
<li>
<strong>Improved type matching</strong>: Better tolerance for mismatching C signatures, eliminating cryptic deserialization errors</li>
</ul>

<h3>
  
  
  Debugging
</h3>

<ul>
<li>
<strong>Enhanced LLDB support</strong>: Better value printing for Embedded Swift types</li>
<li>
<strong>Core dump inspection</strong>: Dictionary, Array, and other common types now inspectable without a live process</li>
<li>
<strong>ARMv7m exception unwinding</strong>: Complete backtraces through exception frames</li>
</ul>

<h3>
  
  
  Linking &amp; Compilation
</h3>

<ul>
<li>
<strong><code>@section</code> and <code>@used</code> attributes</strong>: Control where globals are emitted and ensure symbols aren't stripped (SE-0492)</li>
<li>
<strong>Weak symbol definitions</strong>: Fixes duplicate symbol errors in diamond dependencies</li>
<li>
<strong><code>@export</code> attribute</strong>: Better control over function visibility (SE-0497)</li>
</ul>




<p><em>Want to dive deeper? Read the <a href="https://www.swift.org/blog/embedded-swift-improvements-coming-in-swift-6.3/" rel="noopener noreferrer">full announcement</a> on Swift.org</em></p>

تقدم Swift 6.3 تحسينات كبيرة على Embedded Swift، مما يعزز وظيفته في البيئات ذات الموارد المحدودة مثل المتحكمات الدقيقة. تشمل التحسينات الرئيسية قدرات جديدة لطباعة الأعداد العشرية، وتشخيصات أفضل مع مجموعة EmbeddedRestrictions، وإدخال Swift MMIO 0.1.x لتوليد الشيفرة وتصحيح الأخطاء.

Swift 6.3 presenta mejoras significativas en Embedded Swift, aumentando su funcionalidad para entornos con recursos limitados como microcontroladores. Las mejoras clave incluyen nuevas capacidades de impresión de números de punto flotante, mejores diagnósticos con el grupo EmbeddedRestrictions y la introducción de Swift MMIO 0.1.x para la generación de código y la depuración.

Swift 6.3 apporte des améliorations significatives à Embedded Swift, renforçant sa fonctionnalité pour les environnements à ressources limitées comme les microcontrôleurs. Les principales améliorations comprennent de nouvelles capacités d'impression de nombres à virgule flottante, de meilleurs diagnostics avec le groupe EmbeddedRestrictions, et l'introduction de Swift MMIO 0.1.x pour la génération de code et le débogage.

Swift 6.3 introduces significant upgrades to Embedded Swift, enhancing its functionality for resource-constrained environments like microcontrollers. Key improvements include new floating-point printing capabilities, better diagnostics with the EmbeddedRestrictions group, and the introduction of Swift MMIO 0.1.x for code generation and debugging.

Embedded Swift Gets Major Upgrades in Swift 6.3

<a href="https://www.techspot.com/news/110317-judge-dismisses-lawsuit-twice-due-alleged-deepfake-video.html" target="_blank"><img src="https://www.techspot.com/images2/news/ts3_thumbs/2025/11/2025-11-19-ts3_thumbs-252.jpg" width="800" height="560" style="padding: 15px 0" title="Judge dismisses lawsuit twice due to alleged deepfake video testimony" /></a><br />A California housing dispute is getting media attention over allegations that lawyers presented a deepfake video as witness testimony. NBC News reports that Judge Victoria Kolakowski became suspicious after the supposed witness showed signs that something was not right, including a monotone voice, fuzzy facial features, and repeated facial expressions....<br /><br /><a href="https://www.techspot.com/news/110317-judge-dismisses-lawsuit-twice-due-alleged-deepfake-video.html">Read Entire Article</a><br /><br />

تجذب نزاع سكني في كاليفورنيا الانتباه الإعلامي بعد ظهور مزاعم بأن المحامين قدموا فيديو مزيف كدليل شهود. أعربت القاضية فيكتوريا كولاكوفسكي عن شكوكها بشأن الفيديو، مشيرة إلى صوت الشاهد الأحادي، وملامح الوجه غير الواضحة، وتكرار التعبيرات. أدى ذلك إلى رفض الدعوى القضائية مرتين.

Una disputa de vivienda en California ha llamado la atención de los medios tras las alegaciones de que los abogados presentaron un video deepfake como testimonio. La jueza Victoria Kolakowski expresó su escepticismo sobre el video, señalando la voz monótona del testigo, rasgos faciales borrosos y expresiones repetitivas. Esto llevó al desestimado de la demanda en dos ocasiones.

Un litige immobilier en Californie suscite l'attention des médias après des allégations selon lesquelles des avocats auraient présenté une vidéo deepfake comme témoignage. La juge Victoria Kolakowski a exprimé des doutes sur la vidéo, notant la voix monotone du témoin, des traits faciaux flous et des expressions répétitives. Cela a conduit à l'annulation de la poursuite à deux reprises.

A California housing dispute has drawn attention after allegations surfaced that lawyers presented a deepfake video as witness testimony. Judge Victoria Kolakowski expressed skepticism about the video, noting the witness's monotone voice, unclear facial features, and repetitive expressions. This led to the dismissal of the lawsuit on two occasions.

Judge dismisses lawsuit twice due to alleged deepfake video testimony

arXiv:2511.11966v1 Announce Type: cross 
Abstract: We study the problem of entropy calibration, which asks whether a language model's entropy over generations matches its log loss on human text. Past work found that models are miscalibrated, with entropy per step increasing (and text quality decreasing) as generations grow longer. This error accumulation is a fundamental problem in autoregressive models, and the standard solution is to truncate the distribution, which improves text quality at the cost of diversity. In this paper, we ask: is miscalibration likely to improve with scale, and is it theoretically possible to calibrate without tradeoffs? To build intuition, we first study a simplified theoretical setting to characterize the scaling behavior of miscalibration with respect to dataset size. We find that the scaling behavior depends on the power law exponent of the data distribution -- in particular, for a power law exponent close to 1, the scaling exponent is close to 0, meaning that miscalibration improves very slowly with scale. Next, we measure miscalibration empirically in language models ranging from 0.5B to 70B parameters. We find that the observed scaling behavior is similar to what is predicted by the simplified setting: our fitted scaling exponents for text are close to 0, meaning that larger models accumulate error at a similar rate as smaller ones. This scaling (or, lack thereof) provides one explanation for why we sample from larger models with similar amounts of truncation as smaller models, even though the larger models are of higher quality. However, truncation is not a satisfying solution because it comes at the cost of increased log loss. In theory, is it even possible to reduce entropy while preserving log loss? We prove that it is possible, if we assume access to a black box which can fit models to predict the future entropy of text.

تدرس الورقة مشكلة معايرة الإنتروبيا في نماذج اللغة، مع التركيز على ما إذا كانت إنتروبيا النموذج تتماشى مع خسارة اللوغاريتم على النصوص البشرية. وجدت الدراسات السابقة أن إنتروبيا كل خطوة تزداد (وتنخفض جودة النص) مع زيادة طول الأجيال، مما يبرز مشكلة أساسية في النماذج التلقائية. تسأل الورقة: هل من المحتمل أن تتحسن المعايرة الخاطئة مع زيادة الحجم، وهل من الممكن نظريًا المعايرة دون تنازلات؟ لبناء الفهم، تدرس الورقة أولاً إعدادًا نظريًا مبسطًا لتوصيف سلوك المعايرة الخاطئة بالنسبة لحجم مجموعة البيانات.

El artículo examina la calibración de la entropía en los modelos de lenguaje, centrándose en si la entropía de un modelo se alinea con la pérdida logarítmica en texto humano. Estudios anteriores indicaron que a medida que la longitud de la generación de texto aumenta, la entropía también aumenta mientras que la calidad del texto disminuye, destacando un problema fundamental en los modelos autorregresivos. Los autores investigan si la mala calibración puede mejorar con la escala y si es teóricamente posible calibrar sin compromisos, analizando el comportamiento de escalado en relación con el ta…

Cet article examine la calibration de l'entropie dans les modèles de langage, en se concentrant sur la question de savoir si leur entropie est alignée avec la perte logarithmique sur le texte humain. Des études antérieures ont indiqué qu'à mesure que la longueur de génération de texte augmente, l'entropie augmente tandis que la qualité du texte diminue, soulignant un problème fondamental dans les modèles autorégressifs. Les auteurs se demandent si la mauvaise calibration peut s'améliorer avec l'échelle et si une calibration sans compromis est théoriquement possible, en analysant le comportemen…

The paper examines entropy calibration in language models, focusing on whether their entropy aligns with log loss on human text. Previous studies indicated that as text generation lengthens, entropy increases while text quality declines, highlighting a fundamental issue in autoregressive models. The authors investigate whether miscalibration can improve with scale and if calibration without tradeoffs is theoretically feasible, analyzing the scaling behavior concerning dataset size and power law exponents.

On the Entropy Calibration of Language Models

arXiv:2506.22481v2 Announce Type: replace-cross 
Abstract: In recent years, significant advancements in the field of Natural Language Processing (NLP) have positioned commercialized language models as wide-reaching, highly useful tools. In tandem, there has been an explosion of multidisciplinary research examining how NLP tasks reflect, perpetuate, and amplify social biases such as gender and racial bias. A significant gap in this scholarship is a detailed analysis of how queer sexualities are encoded and (mis)represented by both NLP systems and practitioners. Following previous work in the field of AI fairness, we document how sexuality is defined and operationalized via a survey and analysis of 55 articles that quantify sexuality-based NLP bias. We find that sexuality is not clearly defined in a majority of the literature surveyed, indicating a reliance on assumed or normative conceptions of sexual/romantic practices and identities. Further, we find that methods for extracting biased outputs from NLP technologies often conflate gender and sexual identities, leading to monolithic conceptions of queerness and thus improper quantifications of bias. With the goal of improving sexuality-based NLP bias analyses, we conclude with recommendations that encourage more thorough engagement with both queer communities and interdisciplinary literature.

أدت التطورات الأخيرة في معالجة اللغة الطبيعية (NLP) إلى استخدام واسع النطاق لنماذج اللغة، مما أثار أبحاثًا حول كيفية انعكاس وتعزيز التحيزات الاجتماعية، بما في ذلك التحيزات الجندرية والعرقية. ومع ذلك، هناك فجوة ملحوظة في تحليل كيفية تمثيل الهويات الجنسية غير التقليدية في أنظمة NLP. تكشف دراسة شملت 55 مقالًا أن مفهوم الجنسية غالبًا ما يكون غير محدد بوضوح، مما يعتمد على افتراضات معيارية حول الهويات والممارسات الجنسية والرومانسية، مما يثير مخاوف بشأن كيفية تشغيل مفهوم الجنسية في أبحاث التحيز في NLP.

Los avances recientes en el procesamiento del lenguaje natural (NLP) han llevado a un uso generalizado de modelos de lenguaje, lo que ha provocado investigaciones sobre cómo se reflejan y amplifican los sesgos sociales, incluidos los sesgos de género y raciales. Sin embargo, existe una notable brecha en el análisis de cómo se representan las sexualidades queer en los sistemas de NLP. Una encuesta de 55 artículos revela que la sexualidad a menudo está mal definida, dependiendo de suposiciones normativas sobre las identidades sexuales y románticas, lo que plantea preocupaciones sobre la operacio…

Les avancées récentes en traitement du langage naturel (NLP) ont conduit à une utilisation généralisée des modèles linguistiques, suscitant des recherches sur la réflexion et l'amplification des biais sociaux, y compris les biais de genre et raciaux. Cependant, il existe un écart notable dans l'analyse de la représentation des sexualités queer dans les systèmes NLP. Une enquête sur 55 articles révèle que la sexualité est souvent mal définie, reposant sur des hypothèses normatives concernant les identités sexuelles et romantiques, ce qui soulève des préoccupations quant à l'opérationnalisation …

Recent advancements in Natural Language Processing (NLP) have led to the widespread use of language models, prompting research into the reflection and amplification of social biases, including gender and racial bias. However, there is a notable gap in the analysis of how queer sexualities are represented in NLP systems. A survey of 55 articles reveals that sexuality is often poorly defined, relying on normative assumptions about sexual and romantic identities, which raises concerns about the operationalization of sexuality in NLP bias research.

Theories of "Sexuality" in Natural Language Processing Bias Research

arXiv:2503.11858v3 Announce Type: replace 
Abstract: Large Language Models (LLMs) have demonstrated great potential as evaluators of NLG systems, allowing for high-quality, reference-free, and multi-aspect assessments. However, existing LLM-based metrics suffer from two major drawbacks: reliance on proprietary models to generate training data or perform evaluations, and a lack of fine-grained, explanatory feedback. In this paper, we introduce OpeNLGauge, a fully open-source, reference-free NLG evaluation metric that provides accurate explanations based on error spans. OpeNLGauge is available as a two-stage ensemble of larger open-weight LLMs, or as a small fine-tuned evaluation model, with confirmed generalizability to unseen tasks, domains and aspects. Our extensive meta-evaluation shows that OpeNLGauge achieves competitive correlation with human judgments, outperforming state-of-the-art models on certain tasks while maintaining full reproducibility and providing explanations more than twice as accurate.

OpeNLGauge هي مقياس مفتوح المصدر جديد لتقييم أنظمة توليد اللغة الطبيعية (NLG) باستخدام نماذج اللغة الكبيرة (LLMs). على عكس المقاييس الحالية التي تعتمد على نماذج ملكية، يوفر OpeNLGauge تقييمات بدون مرجع ويقدم تفسيرات دقيقة تعتمد على نطاقات الأخطاء. تم تصميمه ليكون قابلاً للتكيف مع مهام ومجالات متنوعة، حيث يظهر ارتباطًا تنافسيًا مع أحكام البشر ويتفوق على بعض النماذج المتطورة مع ضمان القابلية للتكرار.

OpeNLGauge es una nueva métrica de código abierto para la evaluación de sistemas de Generación de Lenguaje Natural (NLG) que utiliza Modelos de Lenguaje de Gran Tamaño (LLMs). A diferencia de las métricas existentes que dependen de modelos propietarios, OpeNLGauge ofrece evaluaciones sin referencia y proporciona explicaciones detalladas basadas en rangos de error. Está diseñada para ser adaptable a diversas tareas y dominios, mostrando una correlación competitiva con los juicios humanos y superando a algunos modelos de última generación, garantizando la reproducibilidad.

OpeNLGauge est une nouvelle métrique open-source pour l'évaluation des systèmes de génération de langage naturel (NLG) utilisant des modèles de langage de grande taille (LLM). Contrairement aux métriques existantes qui dépendent de modèles propriétaires, OpeNLGauge propose des évaluations sans référence et fournit des explications détaillées basées sur des plages d'erreurs. Elle est conçue pour être adaptable à diverses tâches et domaines, montrant une corrélation compétitive avec les jugements humains et surpassant certains modèles à la pointe de la technologie tout en garantissant la reprodu…

OpeNLGauge is a newly introduced open-source metric for evaluating Natural Language Generation (NLG) systems using Large Language Models (LLMs). Unlike existing metrics that depend on proprietary models, OpeNLGauge offers reference-free evaluations and provides detailed explanations based on error spans. It is designed to be adaptable to various tasks and domains, demonstrating competitive correlation with human judgments and outperforming some state-of-the-art models while ensuring reproducibility.

OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs

arXiv:2511.14112v1 Announce Type: new 
Abstract: Automatic ICD coding from clinical text is a critical task in medical NLP but remains hindered by the extreme long-tail distribution of diagnostic codes. Thousands of rare and zero-shot ICD codes are severely underrepresented in datasets like MIMIC-III, leading to low macro-F1 scores. In this work, we propose a data-centric framework that generates high-quality synthetic discharge summaries to mitigate this imbalance. Our method constructs realistic multi-label code sets anchored on rare codes by leveraging real-world co-occurrence patterns, ICD descriptions, synonyms, taxonomy, and similar clinical notes. Using these structured prompts, we generate 90,000 synthetic notes covering 7,902 ICD codes, significantly expanding the training distribution. We fine-tune two state-of-the-art transformer-based models, PLM-ICD and GKI-ICD, on both the original and extended datasets. Experiments show that our approach modestly improves macro-F1 while maintaining strong micro-F1, outperforming prior SOTA. While the gain may seem marginal relative to the computational cost, our results demonstrate that carefully crafted synthetic data can enhance equity in long-tail ICD code prediction.

يعد الترميز التلقائي لرموز ICD من النصوص السريرية أمرًا ضروريًا في معالجة اللغة الطبيعية الطبية، ولكنه يواجه تحديات بسبب توزيع الرموز التشخيصية الطويلة. العديد من رموز ICD النادرة ممثلة تمثيلًا ناقصًا في مجموعات البيانات مثل MIMIC-III، مما يؤدي إلى انخفاض درجات macro-F1. يقدم هذا العمل إطارًا مركزيًا للبيانات يولد ملخصات خروج اصطناعية عالية الجودة للتخفيف من هذا الخلل. باستخدام أنماط التواجد الواقعية وموارد أخرى، يتم إنتاج 90,000 ملاحظة اصطناعية تغطي 7,902 رمز ICD، مما يزيد بشكل كبير من توزيع التدريب. يُظهر ضبط النموذجين PLM-ICD وGKI-ICD على هذه المجموعات من البيانات تحسينات متواضعة في درجات m…

El codificación automática de ICD a partir de textos clínicos es esencial en el procesamiento del lenguaje natural médico, pero enfrenta desafíos debido a la distribución de larga cola de los códigos diagnósticos. Muchos códigos ICD raros están subrepresentados en conjuntos de datos como MIMIC-III, lo que resulta en bajos puntajes macro-F1. Este trabajo presenta un marco centrado en los datos que genera resúmenes de alta calidad para mitigar este desequilibrio. Utilizando patrones de co-ocurrencia del mundo real y otros recursos, se generan 90,000 notas sintéticas que cubren 7,902 códigos ICD,…

Le codage automatique des ICD à partir de textes cliniques est essentiel en NLP médical, mais il est confronté à des défis en raison de la distribution longue traîne des codes diagnostiques. De nombreux codes ICD rares sont sous-représentés dans des ensembles de données comme MIMIC-III, entraînant de faibles scores macro-F1. Ce travail propose un cadre centré sur les données qui génère des résumés de sortie synthétiques pour remédier à ce problème. En utilisant des modèles de co-occurrence du monde réel et d'autres ressources, la méthode produit 90 000 notes synthétiques pour 7 902 codes ICD, …

Automatic ICD coding from clinical text is essential in medical NLP but faces challenges due to the long-tail distribution of diagnostic codes. Many rare ICD codes are underrepresented in datasets like MIMIC-III, resulting in low macro-F1 scores. This work introduces a data-centric framework that generates synthetic discharge summaries to address this issue. By utilizing real-world co-occurrence patterns and other resources, the method produces 90,000 synthetic notes for 7,902 ICD codes, enhancing the training distribution. Fine-tuning of PLM-ICD and GKI-ICD models on these datasets shows mode…

The Polite Liar: Epistemic Pathology in Language Models

Was this article worth reading? Share it