arXiv:2511.08923v1 Announce Type: new 
Abstract: Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with language modeling. This raises a fundamental question: can we achieve a synergy with high throughput, higher GPU utilization, and AR level quality? Existing methods fail to effectively balance these two aspects, either prioritizing AR using a weaker model for sequential drafting (speculative decoding), leading to lower drafting efficiency, or using some form of left-to-right (AR-like) decoding logic for diffusion, which still suffers from quality degradation and forfeits its potential parallelizability. We introduce TiDAR, a sequence-level hybrid architecture that drafts tokens (Thinking) in Diffusion and samples final outputs (Talking) AutoRegressively - all within a single forward pass using specially designed structured attention masks. This design exploits the free GPU compute density, achieving a strong balance between drafting and verification capacity. Moreover, TiDAR is designed to be serving-friendly (low overhead) as a standalone model. We extensively evaluate TiDAR against AR models, speculative decoding, and diffusion variants across generative and likelihood tasks at 1.5B and 8B scales. Thanks to the parallel drafting and sampling as well as exact KV cache support, TiDAR outperforms speculative decoding in measured throughput and surpasses diffusion models like Dream and Llada in both efficiency and quality. Most notably, TiDAR is the first architecture to close the quality gap with AR models while delivering 4.71x to 5.91x more tokens per second.

تقدم TiDAR، نموذج اللغة الهجين الجديد، وعدًا بتحسين كفاءة توليد النصوص من خلال دمج أساليب الانتشار والنماذج التلقائية. يهدف هذا النموذج إلى تحقيق إنتاجية عالية وجودة في توليد المخرجات اللغوية، مما يعالج قيود النماذج الحالية التي تضحي بالسرعة من أجل الجودة أو العكس. يسمح التصميم المبتكر لـ TiDAR بكتابة فعالة للرموز وعينة نهائية للمخرجات في تمريرة واحدة، مما يمثل تقدمًا كبيرًا في نمذجة اللغة بالذكاء الاصطناعي.

La introducción de TiDAR, un nuevo modelo de lenguaje híbrido, promete mejorar la eficiencia de la generación de texto al combinar métodos de difusión y autoregresivos. Este modelo busca lograr un alto rendimiento y calidad en la generación de salidas lingüísticas, abordando las limitaciones de los modelos existentes que sacrifican la velocidad por la calidad o viceversa. El diseño innovador de TiDAR permite un borrador efectivo de tokens y un muestreo de salidas finales en una sola pasada, representando un avance significativo en la modelación del lenguaje AI.

L'introduction de TiDAR, un nouveau modèle de langage hybride, promet d'améliorer l'efficacité de la génération de texte en combinant des méthodes de diffusion et autoregressives. Ce modèle vise à atteindre un haut débit et une qualité dans la génération de sorties linguistiques, répondant aux limites des modèles existants qui sacrifient soit la vitesse pour la qualité, soit vice versa. Le design innovant de TiDAR permet un brouillon efficace des tokens et un échantillonnage des sorties finales en une seule passe avant, représentant une avancée significative dans la modélisation du langage AI.

The introduction of TiDAR, a new hybrid language model, promises to enhance the efficiency of text generation by combining diffusion and autoregressive methods. This model aims to achieve high throughput and quality in generating language outputs, addressing the limitations of existing models that either sacrifice speed for quality or vice versa. TiDAR's innovative design allows for effective token drafting and final output sampling in a single forward pass, making it a significant advancement in AI language modeling.

TiDAR: Think in Diffusion, Talk in Autoregression

حقق Google Gemini 3 إنجازًا كبيرًا من خلال تجاوز جميع معايير الذكاء الاصطناعي الحالية، بما في ذلك الأكثر تحديًا. يبرز هذا الإنجاز التقدم الذي حققته فريق الذكاء الاصطناعي في Google ويضع Gemini 3 كمتنافس رائد في مجال الذكاء الاصطناعي. يتم الاحتفال بهذا النجاح داخل مجتمع التكنولوجيا، مما يعكس التطور المستمر لتقنيات الذكاء الاصطناعي وقدراتها.

Google Gemini 3 ha logrado un hito significativo al superar todos los benchmarks de IA existentes, incluidos los más desafiantes. Este logro resalta los avances realizados por el equipo de IA de Google y posiciona a Gemini 3 como un competidor líder en el panorama de la inteligencia artificial. El éxito es celebrado dentro de la comunidad tecnológica, reflejando la evolución continua de las tecnologías de IA y sus capacidades.

Google Gemini 3 a atteint un jalon significatif en surpassant tous les benchmarks d'IA existants, y compris les plus difficiles. Cet accomplissement met en évidence les avancées réalisées par l'équipe d'IA de Google et positionne Gemini 3 comme un concurrent de premier plan dans le paysage de l'intelligence artificielle. Le succès est célébré au sein de la communauté technologique, reflétant l'évolution continue des technologies d'IA et de leurs capacités.

Google Gemini 3 has achieved a significant milestone by surpassing all existing AI benchmarks, including the most challenging ones. This accomplishment highlights the advancements made by Google's AI team and positions Gemini 3 as a leading contender in the artificial intelligence landscape. The success is celebrated within the tech community, reflecting the ongoing evolution of AI technologies and their capabilities.

Google Gemini 3 Just Killed Every AI Benchmark, Including the Hardest of All

Two weeks ago I read a line about tool use with Claude that stuck in my head. Paraphrased:

<blockquote>
Direct tool calls don’t really scale. 
Have the model write code that uses tools, and execute that code instead.
</blockquote>

At the same time, I was knee-deep in wiring a JavaScript execution environment into Contenox, my self-hosted runtime for deterministic, chat-native AI workflows.

So of course the thought was:

<blockquote>
What if I just let the model write the JavaScript and run it inside the runtime? 😅
</blockquote>

This post is about what happened when I tried exactly that.




<h2>
 
 
 What is Contenox?
</h2>

Very short version:

<blockquote>
Contenox is a self-hostable runtime for sovereign GenAI applications. 
It models AI behavior as explicit state machines, not opaque prompt chains.
</blockquote>

Some key properties:

Runtime, not a library, Explicit state machines, Chat-native interface, Vendor-agnostic &amp; self-hosted, written in go with lots of passion and zero tolerance for shortcuts




<h2>
 
 
 The experiment: ask it to fetch and summarize a TODO
</h2>

Once the JS execution hook was in place (a Goja VM with some globals exposed), I wired up a new state machine: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>mux_input → moderate → generate_js → run_js → eval_js_result → (repair_js?) → answer
</code></pre>

</div>



Then I opened the Contenox chat UI and typed:

<blockquote>
Me: “Hey Contenox, fetch the TODO item from jsonplaceholder and summarize it.”
</blockquote>

The first attempt didn’t go as planned.

Instead of a neat summary, I got:

<blockquote>
Contenox: 
“I'm sorry, but it seems there was a problem fetching the TODO item from the provided URL due to an Internal Server Error (status code: 500). Unfortunately, I couldn't retrieve any data at this time. My apologies for the inconvenience!”
</blockquote>

I checked manually – Cloudflare was having a moment, JSONPlaceholder was behind it, and the request really was failing. Okay, fair. Long coffee break.

A bit later, I went back to the same chat and typed:

<blockquote>
Me: “okay let's try again now.”
</blockquote>

This time the reply was:

<blockquote>
Contenox: 
<code>Task titled "delectus aut autem" is incomplete. Nothing to summarize here.</code>
</blockquote>

Wait. It worked? Just like that?




<h2>
 
 
 Under the hood: LLM → JavaScript → tools → LLM
</h2>

Looking at the execution trace made it clear what happened.

<h3>
 
 
 1. Input is normalized and moderated
</h3>

The workflow starts with:

<ol>
<li><code>mux_input</code></li>
</ol>

<ul>
<li>
Parses the incoming chat history into a normalized <code>messages</code> array.

<ol>
<li><code>moderate</code></li>
</ol>
</li>
<li>Uses a small model to classify the input as safe/unsafe.</li>
<li>Output: <code>0</code> → safe → continue.</li>
</ul>

So far this is standard workflow stuff.

<h3>
 
 
 2. <code>generate_js</code>: the LLM writes JavaScript
</h3>

Next, the <code>generate_js</code> task runs with a system instruction like:

<ul>
<li>“You are an expert JavaScript agent for the Contenox runtime…”</li>
<li>“You must output JSON: { "code": "" }”</li>
<li>“You can use httpFetch, executeTask, executeTaskChain, …”</li>
<li>“Script must be synchronous, no async/await, and must return a JSON-serializable object.”</li>
</ul>

Given the chat history, the model produced something like this (simplified): 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "code": "var result;
var answer;
var meta = { steps: [] };

var todoUrl = \"https://jsonplaceholder.typicode.com/todos/1\";

// Step 1: Fetch the TODO item
meta.steps.push(\"Fetching TODO item from \" + todoUrl);
var fetchResponse = httpFetch(todoUrl);

if (!fetchResponse.ok) {
 answer = \"I couldn't access that item right now because there was some trouble with getting it. Please check the URL or try again later.\";
 meta.error = {
 message: \"HTTP fetch failed\",
 url: todoUrl,
 status: fetchResponse.status,
 statusText: fetchResponse.statusText,
 body: fetchResponse.body,
 error: fetchResponse.error
 };
 result = { answer: answer, meta: meta };
} else {
 var todoItem;
 try {
 todoItem = JSON.parse(fetchResponse.body);
 meta.raw = todoItem;
 } catch (e) {
 answer = \"I fetched the item, but I couldn't understand its format. It doesn't seem to be valid JSON.\";
 meta.error = {
 message: \"JSON parse error\",
 rawBody: fetchResponse.body,
 parseError: e.toString()
 };
 result = { answer: answer, meta: meta };
 }

 if (todoItem) {
 // Step 2: Summarize the TODO item using an LLM
 meta.steps.push(\"Summarizing TODO item with LLM\");
 var prompt = \"Summarize the following TODO item in a concise sentence: \" + JSON.stringify(todoItem);
 var summarizeTask = executeTask(prompt, \"gpt-4o-mini\", \"openai\");

 if (!summarizeTask.success) {
 answer = \"While I managed to fetch your TODO details, the summary tool had a hiccup and couldn't provide me with a concise sentence right now. We seem to be experiencing some temporary issues.\";
 meta.error = {
 message: \"LLM task execution failed\",
 task_id: summarizeTask.task_id,
 error: summarizeTask.error
 };
 result = { answer: answer, meta: meta };
 } else {
 answer = summarizeTask.response;
 meta.llmResponse = summarizeTask.response;
 result = { answer: answer, meta: meta };
 }
 }
}

result;"
}
</code></pre>

</div>



A few important details here:

<ul>
<li>
It encodes error handling:

<ul>
<li>If HTTP fails → user-friendly <code>answer</code>, raw error in <code>meta.error</code>.</li>
<li>If JSON parse fails → similar pattern.</li>
<li>If the inner LLM call fails → fallback message.</li>
</ul>


</li>

<li>It delegates the actual summarization to another model via <code>executeTask</code>.</li>

<li>It returns a structured <code>result</code> with both <code>answer</code> and <code>meta</code>.</li>

</ul>

This is not the model “calling tools” directly. It’s the model writing a program that calls tools.

<h3>
 
 
 3. <code>run_js</code>: execute the code in a sandbox
</h3>

The next task is <code>run_js</code>, which is just a Contenox <code>hook</code> that calls the JS sandbox: 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "name": "js_sandbox",
 "tool_name": "execute_js",
 "args": {
 "code": "{{.generate_js.code}}"
 }
}
</code></pre>

</div>



Inside the trace you can see:

<ul>
<li>An <code>httpFetch</code> log for the JSONPlaceholder URL.</li>
<li>A response with <code>status: 200 OK</code> when things finally worked.</li>
<li>
An <code>executeTask</code> log with the summarization prompt:

<ul>
<li><code>Summarize the following TODO item in a concise sentence: {"userId":1,"id":1,"title":"delectus aut autem","completed":false}</code></li>
</ul>


</li>

</ul>

The sandbox result looked roughly like: 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "ok": true,
 "result": {
 "answer": "Task titled \"delectus aut autem\" is incomplete.",
 "meta": {
 "llmResponse": "Task titled \"delectus aut autem\" is incomplete.",
 "raw": {
 "userId": 1,
 "id": 1,
 "title": "delectus aut autem",
 "completed": false
 },
 "steps": [
 "Fetching TODO item from https://jsonplaceholder.typicode.com/todos/1",
 "Summarizing TODO item with LLM"
 ]
 }
 },
 "logs": [ ... ],
 "code": "var result; ..."
}
</code></pre>

</div>



<h3>
 
 
 4. <code>eval_js_result</code>: success or retry?
</h3>

Now comes the evaluator:

<ul>
<li>It receives a description of the JS sandbox output.</li>
<li>
The system prompt is very strict:

<ul>
<li>If <code>ok</code> is true and there is a non-empty <code>result.answer</code> → respond with <code>success</code>.</li>
<li>Otherwise → respond with <code>retry</code>.</li>
</ul>


</li>

</ul>

On the successful run, it answered: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>success
</code></pre>

</div>



So the workflow does not go into <code>repair_js</code> or <code>run_js_retry</code>. Happy path.

<h3>
 
 
 5. <code>answer</code>: extract the final user message
</h3>

The final task, <code>answer</code>, is intentionally boring:

<ul>
<li>System prompt: “You are a purely extractive post-processor. Do NOT invent content. Just surface the best existing <code>answer</code> field.”
</li>
<li>
It gets:

<ul>
<li>First run (<code>run_js</code> result).</li>
<li>Second run (<code>run_js_retry</code>), if any.</li>
</ul>


</li>

<li>

Selection rule:

<ul>
<li>Take the last non-empty <code>answer</code> you see.</li>
<li>Output it verbatim.</li>
</ul>


</li>

</ul>

In our case it found: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>Task titled "delectus aut autem" is incomplete.
</code></pre>

</div>



And that’s exactly what Contenox replied in chat.




<h2>
 
 
 Why this is interesting (to me, at least)
</h2>

What I originally set out to build:

<blockquote>
A runtime for deterministic, observable GenAI workflows. 
Tasks, transitions, hooks – all explicit and replayable.
</blockquote>

What I accidentally stumbled into:

<blockquote>
A multi-model, self-orchestrating agent pattern, 
where LLMs write code that uses tools, and the runtime executes and evaluates that code.
</blockquote>

The pattern looks like this:

<ol>
<li>
Planner LLM (<code>generate_js</code>)</li>
</ol>

<ul>
<li>Reads user intent + history.</li>
<li>Emits JavaScript that calls <code>httpFetch</code>, <code>executeTask</code>, <code>executeTaskChain</code>, hooks, etc.</li>
</ul>

<ol>
<li>
Execution environment (<code>run_js</code> in Goja)</li>
</ol>

<ul>
<li>Deterministic execution of that JS.</li>
<li>Full logs of every HTTP call, every inner LLM call, every step.</li>
</ul>

<ol>
<li>
Controller LLM (<code>eval_js_result</code>)</li>
</ol>

<ul>
<li>Looks at the sandbox result.</li>
<li>Decides: is this good enough? Retry? Repair?</li>
</ul>

<ol>
<li>
Repair LLM (<code>repair_js</code>, if needed)</li>
</ol>

<ul>
<li>Gets the previous code + error output.</li>
<li>Writes a fixed version of the JS.</li>
</ul>

<ol>
<li>
Answer LLM (<code>answer</code>)</li>
</ol>

<ul>
<li>Doesn’t “reason” at all.</li>
<li>Just extracts the final <code>answer</code> text safely.</li>
</ul>

All of that is expressed as an explicit state machine in Contenox.

No hidden loops, no undocumented retries, no magic glue code inside some SDK. It’s all visible in the workflow graph and trace.




To me, that’s the exciting part:

<blockquote>
You don’t have to choose between “boring deterministic workflows” and “fancy agents”. 
You can build the agent on top of deterministic workflows. 
And everything stays **self-hosted, inspectable, and auditable if you want.
</blockquote>

يتناول المقال تجربة تم فيها السماح لنموذج ذكاء اصطناعي بكتابة كود جافا سكريبت داخل بيئة مستقلة تُدعى كونتينوكس. يتأمل الكاتب في مفهوم يتعلق باستخدام الأدوات في الذكاء الاصطناعي، مقترحًا أن النماذج يجب أن تولد كودًا لاستخدام الأدوات بدلاً من إجراء مكالمات مباشرة. تم اختبار هذا النهج من خلال تنفيذ كود جافا سكريبت الذي تم إنشاؤه داخل بيئة كونتينوكس، بهدف تحسين كفاءة سير العمل في الذكاء الاصطناعي.

El artículo discute un experimento en el que se permitió a un modelo de IA escribir código JavaScript dentro de un entorno autónomo llamado Contenox. El autor reflexiona sobre un concepto relacionado con el uso de herramientas en IA, sugiriendo que los modelos deberían generar código para utilizar herramientas en lugar de realizar llamadas directas. Este enfoque se probó ejecutando el JavaScript generado dentro del entorno Contenox, con el objetivo de mejorar la eficiencia de los flujos de trabajo de IA.

L'article traite d'une expérience où un modèle d'IA a été autorisé à écrire du code JavaScript au sein d'un environnement autonome appelé Contenox. L'auteur réfléchit à un concept concernant l'utilisation des outils en IA, suggérant que les modèles devraient générer du code pour utiliser des outils plutôt que d'effectuer des appels directs. Cette approche a été testée en exécutant le JavaScript généré dans l'environnement Contenox, visant à améliorer l'efficacité des flux de travail en IA.

The article discusses an experiment where an AI model was allowed to write JavaScript code within a self-hosted runtime called Contenox. The author reflects on a concept regarding tool usage in AI, suggesting that models should generate code to utilize tools instead of direct calls. This approach was tested by executing the generated JavaScript within the Contenox environment, aiming to enhance the efficiency of AI workflows.

I Let an LLM Write JavaScript Inside My AI Runtime. Here’s What Happened

Caterpillar Inc. was always an unlikely winner in the artificial intelligence craze. It makes the bulk of its money selling the equipment like yellow earth movers that has made it a stalwart of American industry.

تظهر شركة كاتربيلر كجهة غير متوقعة في قطاع الذكاء الاصطناعي، حيث تُعرف أساسًا بتصنيع الآلات الثقيلة مثل آلات الحفر. تاريخيًا، كانت الشركة تركز على المعدات الصناعية التقليدية، مما يجعلها أقل توافقًا مع الاتجاهات التكنولوجية المدفوعة بالذكاء الاصطناعي التي اجتذبت العديد من القطاعات الأخرى. على الرغم من الاهتمام المتزايد بالذكاء الاصطناعي، لا يزال النشاط الرئيسي لشركة كاتربيلر متجذرًا في الآلات المادية، مما قد يحد من جاذبيتها في مشهد تكنولوجي سريع التطور.

Caterpillar Inc. se ha presentado como un jugador improbable en el sector de la inteligencia artificial, siendo principalmente conocida por la fabricación de maquinaria pesada como las excavadoras. La compañía ha estado históricamente enfocada en equipos industriales tradicionales, lo que la hace menos alineada con las tendencias tecnológicas impulsadas por la IA que han cautivado a muchos otros sectores. A pesar del creciente interés en la IA, el negocio principal de Caterpillar sigue anclado en la maquinaria física, lo que podría limitar su atractivo en un paisaje tecnológico en rápida evolu…

Caterpillar Inc. se présente comme un acteur improbable dans le secteur de l'intelligence artificielle, étant principalement connu pour sa fabrication de machines lourdes telles que les pelles mécaniques. L'entreprise s'est historiquement concentrée sur l'équipement industriel traditionnel, ce qui la rend moins alignée avec les tendances technologiques axées sur l'IA qui ont captivé de nombreux autres secteurs. Malgré l'intérêt croissant pour l'IA, le cœur de métier de Caterpillar reste ancré dans la machinerie physique, ce qui pourrait limiter son attrait dans un paysage technologique en évol…

Caterpillar Inc. has emerged as an unlikely player in the artificial intelligence sector, primarily known for its manufacturing of heavy machinery such as earth movers. The company has historically focused on traditional industrial equipment, making it less aligned with the AI-driven technology trends that have captivated many other sectors. Despite the growing interest in AI, Caterpillar's core business remains rooted in physical machinery, which may limit its appeal in the rapidly evolving tech landscape.

Caterpillar’s Lone Bear Says Machinery Maker Is No AI Darling

<a href="https://www.techspot.com/news/110306-microsoft-explains-how-windows-11-become-agentic-os.html" target="_blank"><img src="https://www.techspot.com/images2/news/ts3_thumbs/2025/11/2025-11-18-ts3_thumbs-d01.jpg" width="800" height="560" style="padding: 15px 0" title="Microsoft explains how Windows 11 will become an agentic OS whether you like it or not" /></a> Windows president Pavan Davuluri recently described the future of Windows as an agentic operating system, where AI bots and large language models handle the user's commands on files and computing tasks. Critics mostly greeted the idea with scorn, cursing, and frustration over the "bug-ridden slop pile" the OS currently is.... <a href="https://www.techspot.com/news/110306-microsoft-explains-how-windows-11-become-agentic-os.html">Read Entire Article</a>

وصف رئيس شركة مايكروسوفت، بavan دافولوري، خطط ويندوز 11 للتطور إلى ما يسميه 'نظام تشغيل وكيل'. ستتضمن هذه التحويلة دمج روبوتات الذكاء الاصطناعي ونماذج اللغة الكبيرة لإدارة أوامر المستخدم ومهام الحوسبة. ومع ذلك، تم استقبال هذا الإعلان بسخرية وانتقادات من المستخدمين الذين يشعرون بالإحباط من الحالة الحالية لنظام التشغيل، الذي يصفونه بأنه مليء بالأخطاء.

El presidente de Microsoft, Pavan Davuluri, ha descrito los planes para que Windows 11 evolucione hacia lo que él llama un 'sistema operativo agente'. Esta transformación implicará la integración de bots de IA y modelos de lenguaje para gestionar los comandos y tareas informáticas del usuario. Sin embargo, el anuncio ha sido recibido con escepticismo y críticas por parte de los usuarios, que están frustrados con el estado actual del sistema operativo, que describen como plagado de errores.

Le président de Microsoft, Pavan Davuluri, a décrit les projets de Windows 11 pour évoluer vers ce qu'il appelle un 'système d'exploitation agentique'. Cette transformation impliquera l'intégration de bots IA et de modèles de langage pour gérer les commandes des utilisateurs et les tâches informatiques. Cependant, cette annonce a été accueillie avec scepticisme et critiques de la part des utilisateurs frustrés par l'état actuel du système d'exploitation, qu'ils décrivent comme rempli de bogues.

Microsoft's president, Pavan Davuluri, has outlined plans for Windows 11 to evolve into what he describes as an 'agentic operating system.' This transformation will involve the integration of AI bots and large language models to manage user commands and computing tasks. However, the announcement has been met with skepticism and criticism from users who are frustrated with the current state of the operating system, which they describe as plagued by bugs.

Microsoft explains how Windows 11 will become an agentic OS whether you like it or not

Although Black Friday is still two weeks away, you can find great Nintendo Switch and Switch 2 deals now. I've collected the best from Walmart, Best Buy, and more.

مع اقتراب يوم الجمعة السوداء بعد أسبوعين، تتوفر بالفعل عروض مبكرة على أجهزة نينتندو سويتش وسويتش 2. تقدم متاجر كبيرة مثل وول مارت وبيست باي أكثر من 20 عرضًا، مما يوفر للمستهلكين فرصة لتوفير المال على منتجات الألعاب الشهيرة قبل موسم التسوق للعطلات.

A medida que se acerca el Black Friday en dos semanas, ya están disponibles ofertas anticipadas en las consolas Nintendo Switch y Switch 2. Grandes minoristas como Walmart y Best Buy están ofreciendo más de 20 ventas, brindando a los consumidores la oportunidad de ahorrar en productos de videojuegos populares antes de la locura de compras navideñas.

À l'approche de Black Friday dans deux semaines, des offres anticipées sur les consoles Nintendo Switch et Switch 2 sont déjà disponibles. Des détaillants majeurs comme Walmart et Best Buy proposent plus de 20 ventes, offrant aux consommateurs l'occasion d'économiser sur des produits de jeu populaires avant la ruée des achats de vacances.

As Black Friday approaches in two weeks, early deals on Nintendo Switch and Switch 2 consoles are already available. Major retailers like Walmart and Best Buy are offering over 20 sales, providing consumers with an opportunity to save on popular gaming products ahead of the holiday shopping rush.

Best early Black Friday Nintendo Switch deals 2025: 20+ sales out early

<A HREF="https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9zb3VyY2VzLm5ld3MvcC9kZW1pcy1oYXNzaWJhcy1vbi1nZW1pbmktMy13b3JsZD91dG1fY2FtcGFpZ249ZW1haWwtcG9zdCZyPTFyODVmJnRva2VuPWV5SjFjMlZ5WDJsa0lqb3lPVFE1T0RreExDSndiM04wWDJsa0lqb3hOemt5TlRnNE9UZ3NJbWxoZENJNk1UYzJNelE1TmpVME5pd2laWGh3SWpveE56WTJNRGc0TlRRMkxDSnBjM01pT2lKd2RXSXRNelV5TlRjNE1DSXNJbk4xWWlJNkluQnZjM1F0Y21WaFkzUnBiMjRpZlEucDdlcWFuMFM3WDNXTXQ1OUY4Y2RYZG1tb1VBRGJNMlBGZDM1c3ZZWUc2YyIsInAiOjE3OTI1ODg5OCwicyI6MzUyNTc4MCwiZiI6ZmFsc2UsInUiOjI5NDk4OTEsImlhdCI6MTc2MzQ5NjU0NiwiZXhwIjoyMDc5MDcyNTQ2LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.MstA6dhKs3CLxJgLSEpyVK_D4Oz9SmcjeXKNnqEDzIg?"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251118/i47.jpg"></A>
<A HREF="http://www.techmeme.com/251118/p47#a251118p47" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Alex Heath / <A HREF="https://sources.news/">Sources</A>: 
<A HREF="https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9zb3VyY2VzLm5ld3MvcC9kZW1pcy1oYXNzaWJhcy1vbi1nZW1pbmktMy13b3JsZD91dG1fY2FtcGFpZ249ZW1haWwtcG9zdCZyPTFyODVmJnRva2VuPWV5SjFjMlZ5WDJsa0lqb3lPVFE1T0RreExDSndiM04wWDJsa0lqb3hOemt5TlRnNE9UZ3NJbWxoZENJNk1UYzJNelE1TmpVME5pd2laWGh3SWpveE56WTJNRGc0TlRRMkxDSnBjM01pT2lKd2RXSXRNelV5TlRjNE1DSXNJbk4xWWlJNkluQnZjM1F0Y21WaFkzUnBiMjRpZlEucDdlcWFuMFM3WDNXTXQ1OUY4Y2RYZG1tb1VBRGJNMlBGZDM1c3ZZWUc2YyIsInAiOjE3OTI1ODg5OCwicyI6MzUyNTc4MCwiZiI6ZmFsc2UsInUiOjI5NDk4OTEsImlhdCI6MTc2MzQ5NjU0NiwiZXhwIjoyMDc5MDcyNTQ2LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.MstA6dhKs3CLxJgLSEpyVK_D4Oz9SmcjeXKNnqEDzIg?">Q&amp;A with Demis Hassabis on Gemini 3, spending most of his research time on world models, fitting the entire Google Search index into Gemini, AI bubble, and more</A>&nbsp; &mdash;&nbsp; Demis Hassabis was noticeably relaxed when he joined our virtual call from London.&nbsp; &mdash;&nbsp; It was the day before the release of Gemini 3 &hellip;

ناقش ديميس هاسابيس، المؤسس المشارك لشركة ديب مايند، تقدم نموذج جيميني 3، أحدث نموذج للذكاء الاصطناعي من جوجل، مشددًا على قدراته في نمذجة العالم ودمج كامل فهرس بحث جوجل في النظام. تناول المخاوف بشأن فقاعة الذكاء الاصطناعي وأبرز إمكانيات النموذج في تحسين تفاعلات المستخدمين وتوفير معلومات أكثر دقة. تعكس أفكار هاسابيس التزامًا بدفع حدود تكنولوجيا الذكاء الاصطناعي ودمجها في التطبيقات اليومية.

Demis Hassabis, cofundador de DeepMind, discutió los avances de Gemini 3, el último modelo de IA de Google, enfatizando sus capacidades en modelos del mundo y la integración de todo el índice de búsqueda de Google en el sistema. Abordó las preocupaciones sobre la burbuja de la IA y destacó el potencial del modelo para mejorar las interacciones con los usuarios y proporcionar información más precisa. Las ideas de Hassabis reflejan un compromiso por llevar la tecnología de IA a nuevos límites y su integración en aplicaciones cotidianas.

Demis Hassabis, co-fondateur de DeepMind, a discuté des avancées de Gemini 3, le dernier modèle d'IA de Google, en mettant l'accent sur ses capacités en modélisation du monde et l'intégration de l'ensemble de l'index de recherche de Google dans le système. Il a abordé les préoccupations concernant la bulle de l'IA et a souligné le potentiel du modèle à améliorer les interactions avec les utilisateurs et à fournir des informations plus précises. Les réflexions de Hassabis reflètent un engagement à repousser les limites de la technologie IA et son intégration dans les applications quotidiennes.

Demis Hassabis, co-founder of DeepMind, discussed the advancements of Gemini 3, Google's latest AI model, emphasizing its capabilities in world modeling and fitting the entire Google Search index into the system. He addressed concerns about the AI bubble and highlighted the model's potential to enhance user interactions and provide more accurate information. Hassabis's insights reflect a commitment to pushing the boundaries of AI technology and its integration into everyday applications.

Q&A with Demis Hassabis on Gemini 3, spending most of his research time on world models, fitting the entire Google Search index into Gemini, AI bubble, and more (Alex Heath/Sources)

arXiv:2511.11966v1 Announce Type: cross 
Abstract: We study the problem of entropy calibration, which asks whether a language model's entropy over generations matches its log loss on human text. Past work found that models are miscalibrated, with entropy per step increasing (and text quality decreasing) as generations grow longer. This error accumulation is a fundamental problem in autoregressive models, and the standard solution is to truncate the distribution, which improves text quality at the cost of diversity. In this paper, we ask: is miscalibration likely to improve with scale, and is it theoretically possible to calibrate without tradeoffs? To build intuition, we first study a simplified theoretical setting to characterize the scaling behavior of miscalibration with respect to dataset size. We find that the scaling behavior depends on the power law exponent of the data distribution -- in particular, for a power law exponent close to 1, the scaling exponent is close to 0, meaning that miscalibration improves very slowly with scale. Next, we measure miscalibration empirically in language models ranging from 0.5B to 70B parameters. We find that the observed scaling behavior is similar to what is predicted by the simplified setting: our fitted scaling exponents for text are close to 0, meaning that larger models accumulate error at a similar rate as smaller ones. This scaling (or, lack thereof) provides one explanation for why we sample from larger models with similar amounts of truncation as smaller models, even though the larger models are of higher quality. However, truncation is not a satisfying solution because it comes at the cost of increased log loss. In theory, is it even possible to reduce entropy while preserving log loss? We prove that it is possible, if we assume access to a black box which can fit models to predict the future entropy of text.

تدرس الورقة مشكلة معايرة الإنتروبيا في نماذج اللغة، مع التركيز على ما إذا كانت إنتروبيا النموذج تتماشى مع خسارة اللوغاريتم على النصوص البشرية. وجدت الدراسات السابقة أن إنتروبيا كل خطوة تزداد (وتنخفض جودة النص) مع زيادة طول الأجيال، مما يبرز مشكلة أساسية في النماذج التلقائية. تسأل الورقة: هل من المحتمل أن تتحسن المعايرة الخاطئة مع زيادة الحجم، وهل من الممكن نظريًا المعايرة دون تنازلات؟ لبناء الفهم، تدرس الورقة أولاً إعدادًا نظريًا مبسطًا لتوصيف سلوك المعايرة الخاطئة بالنسبة لحجم مجموعة البيانات.

El artículo examina la calibración de la entropía en los modelos de lenguaje, centrándose en si la entropía de un modelo se alinea con la pérdida logarítmica en texto humano. Estudios anteriores indicaron que a medida que la longitud de la generación de texto aumenta, la entropía también aumenta mientras que la calidad del texto disminuye, destacando un problema fundamental en los modelos autorregresivos. Los autores investigan si la mala calibración puede mejorar con la escala y si es teóricamente posible calibrar sin compromisos, analizando el comportamiento de escalado en relación con el ta…

Cet article examine la calibration de l'entropie dans les modèles de langage, en se concentrant sur la question de savoir si leur entropie est alignée avec la perte logarithmique sur le texte humain. Des études antérieures ont indiqué qu'à mesure que la longueur de génération de texte augmente, l'entropie augmente tandis que la qualité du texte diminue, soulignant un problème fondamental dans les modèles autorégressifs. Les auteurs se demandent si la mauvaise calibration peut s'améliorer avec l'échelle et si une calibration sans compromis est théoriquement possible, en analysant le comportemen…

The paper examines entropy calibration in language models, focusing on whether their entropy aligns with log loss on human text. Previous studies indicated that as text generation lengthens, entropy increases while text quality declines, highlighting a fundamental issue in autoregressive models. The authors investigate whether miscalibration can improve with scale and if calibration without tradeoffs is theoretically feasible, analyzing the scaling behavior concerning dataset size and power law exponents.

On the Entropy Calibration of Language Models

arXiv:2510.24021v2 Announce Type: replace 
Abstract: Knowledge distillation (KD) is a standard route to compress Large Language Models (LLMs) into compact students, yet most pipelines uniformly apply token-wise loss regardless of teacher confidence. This indiscriminate supervision amplifies noisy, high-entropy signals and is especially harmful under large teacher-student capacity gaps. We introduce SelecTKD, a plug-and-play Selective Token-Weighted distillation framework that shifts the focus from "how to measure divergence" to "where to apply learning". At each step, the student proposes tokens that are verified by the teacher through a robust propose-and-verify procedure with two variants: greedy Top-k and non-greedy Spec-k. Accepted tokens receive full loss, while rejected tokens are masked or down-weighted. This objective-agnostic design works with on- and off-policy data, induces an implicit curriculum quantified by Token Acceptance Rate (TAR), and stabilizes optimization. Across instruction following, mathematical reasoning, code generation, and a VLM setting, SelecTKD consistently improves strong baselines and achieves state-of-the-art results for small models without architectural changes or extra reference models.

SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs

arXiv:2511.13368v1 Announce Type: new 
Abstract: Large language models (LLMs) perform strongly across tasks and languages, yet how improvements in one task or language affect other tasks and languages and their combinations remains poorly understood. We conduct a controlled PEFT/LoRA study across multiple open-weight LLM families and sizes, treating task and language as transfer axes while conditioning on model family and size; we fine-tune each model on a single task-language source and measure transfer as the percentage-point change versus its baseline score when evaluated on all other task-language target pairs. We decompose transfer into (i) Matched-Task (Cross-Language), (ii) Matched-Language (Cross-Task), and (iii) Cross-Task (Cross-Language) regimes. We uncover two consistent general patterns. First, a pronounced on-task vs. off-task asymmetry: Matched-Task (Cross-Language) transfer is reliably positive, whereas off-task transfer often incurs collateral degradation. Second, a stable donor-recipient structure across languages and tasks (hub donors vs. brittle recipients). We outline implications for risk-aware fine-tuning and model specialisation.

Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning

arXiv:2511.11878v1 Announce Type: new 
Abstract: While large language models (LLMs) show transformative potential in healthcare, their development remains focused on high-resource languages, creating a critical barrier for others as simple translation fails to capture unique clinical and cultural nuances, such as endemic diseases. To address this, we introduce MedPT, the first large-scale, real-world corpus for Brazilian Portuguese, comprising 384,095 authentic question-answer pairs from patient-doctor interactions. The dataset underwent a meticulous multi-stage curation protocol, using a hybrid quantitative-qualitative analysis to filter noise and contextually enrich thousands of ambiguous queries. We further augmented the corpus via LLM-driven annotation, classifying questions into seven semantic types to capture user intent. Our analysis reveals its thematic breadth (3,200 topics) and unique linguistic properties, like the natural asymmetry in patient-doctor communication. To validate its utility, we benchmark a medical specialty routing task: fine-tuning a 1.7B parameter model achieves an outstanding 94\% F1-score on a 20-class setup. Furthermore, our qualitative error analysis shows misclassifications are not random but reflect genuine clinical ambiguities (e.g., between comorbid conditions), proving the dataset's deep semantic richness. We publicly release MedPT to foster the development of more equitable, accurate, and culturally-aware medical technologies for the Portuguese-speaking world.

TiDAR: Think in Diffusion, Talk in Autoregression

Was this article worth reading? Share it