arXiv:2511.12309v1 Announce Type: cross 
Abstract: Self-consistency (SC) is a widely used test-time inference technique for improving performance in chain-of-thought reasoning. It involves generating multiple responses, or samples from a large language model (LLM) and selecting the most frequent answer. This procedure can naturally be viewed as a majority vote or empirical mode estimation. Despite its effectiveness, SC is prohibitively expensive at scale when naively applied to datasets, and it lacks a unified theoretical treatment of sample efficiency and scaling behavior. In this paper, we provide the first comprehensive analysis of SC's scaling behavior and its variants, drawing on mode estimation and voting theory. We derive and empirically validate power law scaling for self-consistency across datasets, and analyze the sample efficiency for fixed-allocation and dynamic-allocation sampling schemes. From these insights, we introduce Blend-ASC, a novel variant of self-consistency that dynamically allocates samples to questions during inference, achieving state-of-the-art sample efficiency. Our approach uses 6.8x fewer samples than vanilla SC on average, outperforming both fixed- and dynamic-allocation SC baselines, thereby demonstrating the superiority of our approach in terms of efficiency. In contrast to existing variants, Blend-ASC is hyperparameter-free and can fit an arbitrary sample budget, ensuring it can be easily applied to any self-consistency application.

Optimal Self-Consistency for Efficient Reasoning with Large Language Models

حقق Google Gemini 3 إنجازًا كبيرًا من خلال تجاوز جميع معايير الذكاء الاصطناعي الحالية، بما في ذلك الأكثر تحديًا. يبرز هذا الإنجاز التقدم الذي حققته فريق الذكاء الاصطناعي في Google ويضع Gemini 3 كمتنافس رائد في مجال الذكاء الاصطناعي. يتم الاحتفال بهذا النجاح داخل مجتمع التكنولوجيا، مما يعكس التطور المستمر لتقنيات الذكاء الاصطناعي وقدراتها.

Google Gemini 3 ha logrado un hito significativo al superar todos los benchmarks de IA existentes, incluidos los más desafiantes. Este logro resalta los avances realizados por el equipo de IA de Google y posiciona a Gemini 3 como un competidor líder en el panorama de la inteligencia artificial. El éxito es celebrado dentro de la comunidad tecnológica, reflejando la evolución continua de las tecnologías de IA y sus capacidades.

Google Gemini 3 a atteint un jalon significatif en surpassant tous les benchmarks d'IA existants, y compris les plus difficiles. Cet accomplissement met en évidence les avancées réalisées par l'équipe d'IA de Google et positionne Gemini 3 comme un concurrent de premier plan dans le paysage de l'intelligence artificielle. Le succès est célébré au sein de la communauté technologique, reflétant l'évolution continue des technologies d'IA et de leurs capacités.

Google Gemini 3 has achieved a significant milestone by surpassing all existing AI benchmarks, including the most challenging ones. This accomplishment highlights the advancements made by Google's AI team and positions Gemini 3 as a leading contender in the artificial intelligence landscape. The success is celebrated within the tech community, reflecting the ongoing evolution of AI technologies and their capabilities.

Google Gemini 3 Just Killed Every AI Benchmark, Including the Hardest of All

Two weeks ago I read a line about tool use with Claude that stuck in my head. Paraphrased:

<blockquote>
Direct tool calls don’t really scale. 
Have the model write code that uses tools, and execute that code instead.
</blockquote>

At the same time, I was knee-deep in wiring a JavaScript execution environment into Contenox, my self-hosted runtime for deterministic, chat-native AI workflows.

So of course the thought was:

<blockquote>
What if I just let the model write the JavaScript and run it inside the runtime? 😅
</blockquote>

This post is about what happened when I tried exactly that.




<h2>
 
 
 What is Contenox?
</h2>

Very short version:

<blockquote>
Contenox is a self-hostable runtime for sovereign GenAI applications. 
It models AI behavior as explicit state machines, not opaque prompt chains.
</blockquote>

Some key properties:

Runtime, not a library, Explicit state machines, Chat-native interface, Vendor-agnostic &amp; self-hosted, written in go with lots of passion and zero tolerance for shortcuts




<h2>
 
 
 The experiment: ask it to fetch and summarize a TODO
</h2>

Once the JS execution hook was in place (a Goja VM with some globals exposed), I wired up a new state machine: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>mux_input → moderate → generate_js → run_js → eval_js_result → (repair_js?) → answer
</code></pre>

</div>



Then I opened the Contenox chat UI and typed:

<blockquote>
Me: “Hey Contenox, fetch the TODO item from jsonplaceholder and summarize it.”
</blockquote>

The first attempt didn’t go as planned.

Instead of a neat summary, I got:

<blockquote>
Contenox: 
“I'm sorry, but it seems there was a problem fetching the TODO item from the provided URL due to an Internal Server Error (status code: 500). Unfortunately, I couldn't retrieve any data at this time. My apologies for the inconvenience!”
</blockquote>

I checked manually – Cloudflare was having a moment, JSONPlaceholder was behind it, and the request really was failing. Okay, fair. Long coffee break.

A bit later, I went back to the same chat and typed:

<blockquote>
Me: “okay let's try again now.”
</blockquote>

This time the reply was:

<blockquote>
Contenox: 
<code>Task titled "delectus aut autem" is incomplete. Nothing to summarize here.</code>
</blockquote>

Wait. It worked? Just like that?




<h2>
 
 
 Under the hood: LLM → JavaScript → tools → LLM
</h2>

Looking at the execution trace made it clear what happened.

<h3>
 
 
 1. Input is normalized and moderated
</h3>

The workflow starts with:

<ol>
<li><code>mux_input</code></li>
</ol>

<ul>
<li>
Parses the incoming chat history into a normalized <code>messages</code> array.

<ol>
<li><code>moderate</code></li>
</ol>
</li>
<li>Uses a small model to classify the input as safe/unsafe.</li>
<li>Output: <code>0</code> → safe → continue.</li>
</ul>

So far this is standard workflow stuff.

<h3>
 
 
 2. <code>generate_js</code>: the LLM writes JavaScript
</h3>

Next, the <code>generate_js</code> task runs with a system instruction like:

<ul>
<li>“You are an expert JavaScript agent for the Contenox runtime…”</li>
<li>“You must output JSON: { "code": "" }”</li>
<li>“You can use httpFetch, executeTask, executeTaskChain, …”</li>
<li>“Script must be synchronous, no async/await, and must return a JSON-serializable object.”</li>
</ul>

Given the chat history, the model produced something like this (simplified): 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "code": "var result;
var answer;
var meta = { steps: [] };

var todoUrl = \"https://jsonplaceholder.typicode.com/todos/1\";

// Step 1: Fetch the TODO item
meta.steps.push(\"Fetching TODO item from \" + todoUrl);
var fetchResponse = httpFetch(todoUrl);

if (!fetchResponse.ok) {
 answer = \"I couldn't access that item right now because there was some trouble with getting it. Please check the URL or try again later.\";
 meta.error = {
 message: \"HTTP fetch failed\",
 url: todoUrl,
 status: fetchResponse.status,
 statusText: fetchResponse.statusText,
 body: fetchResponse.body,
 error: fetchResponse.error
 };
 result = { answer: answer, meta: meta };
} else {
 var todoItem;
 try {
 todoItem = JSON.parse(fetchResponse.body);
 meta.raw = todoItem;
 } catch (e) {
 answer = \"I fetched the item, but I couldn't understand its format. It doesn't seem to be valid JSON.\";
 meta.error = {
 message: \"JSON parse error\",
 rawBody: fetchResponse.body,
 parseError: e.toString()
 };
 result = { answer: answer, meta: meta };
 }

 if (todoItem) {
 // Step 2: Summarize the TODO item using an LLM
 meta.steps.push(\"Summarizing TODO item with LLM\");
 var prompt = \"Summarize the following TODO item in a concise sentence: \" + JSON.stringify(todoItem);
 var summarizeTask = executeTask(prompt, \"gpt-4o-mini\", \"openai\");

 if (!summarizeTask.success) {
 answer = \"While I managed to fetch your TODO details, the summary tool had a hiccup and couldn't provide me with a concise sentence right now. We seem to be experiencing some temporary issues.\";
 meta.error = {
 message: \"LLM task execution failed\",
 task_id: summarizeTask.task_id,
 error: summarizeTask.error
 };
 result = { answer: answer, meta: meta };
 } else {
 answer = summarizeTask.response;
 meta.llmResponse = summarizeTask.response;
 result = { answer: answer, meta: meta };
 }
 }
}

result;"
}
</code></pre>

</div>



A few important details here:

<ul>
<li>
It encodes error handling:

<ul>
<li>If HTTP fails → user-friendly <code>answer</code>, raw error in <code>meta.error</code>.</li>
<li>If JSON parse fails → similar pattern.</li>
<li>If the inner LLM call fails → fallback message.</li>
</ul>


</li>

<li>It delegates the actual summarization to another model via <code>executeTask</code>.</li>

<li>It returns a structured <code>result</code> with both <code>answer</code> and <code>meta</code>.</li>

</ul>

This is not the model “calling tools” directly. It’s the model writing a program that calls tools.

<h3>
 
 
 3. <code>run_js</code>: execute the code in a sandbox
</h3>

The next task is <code>run_js</code>, which is just a Contenox <code>hook</code> that calls the JS sandbox: 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "name": "js_sandbox",
 "tool_name": "execute_js",
 "args": {
 "code": "{{.generate_js.code}}"
 }
}
</code></pre>

</div>



Inside the trace you can see:

<ul>
<li>An <code>httpFetch</code> log for the JSONPlaceholder URL.</li>
<li>A response with <code>status: 200 OK</code> when things finally worked.</li>
<li>
An <code>executeTask</code> log with the summarization prompt:

<ul>
<li><code>Summarize the following TODO item in a concise sentence: {"userId":1,"id":1,"title":"delectus aut autem","completed":false}</code></li>
</ul>


</li>

</ul>

The sandbox result looked roughly like: 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "ok": true,
 "result": {
 "answer": "Task titled \"delectus aut autem\" is incomplete.",
 "meta": {
 "llmResponse": "Task titled \"delectus aut autem\" is incomplete.",
 "raw": {
 "userId": 1,
 "id": 1,
 "title": "delectus aut autem",
 "completed": false
 },
 "steps": [
 "Fetching TODO item from https://jsonplaceholder.typicode.com/todos/1",
 "Summarizing TODO item with LLM"
 ]
 }
 },
 "logs": [ ... ],
 "code": "var result; ..."
}
</code></pre>

</div>



<h3>
 
 
 4. <code>eval_js_result</code>: success or retry?
</h3>

Now comes the evaluator:

<ul>
<li>It receives a description of the JS sandbox output.</li>
<li>
The system prompt is very strict:

<ul>
<li>If <code>ok</code> is true and there is a non-empty <code>result.answer</code> → respond with <code>success</code>.</li>
<li>Otherwise → respond with <code>retry</code>.</li>
</ul>


</li>

</ul>

On the successful run, it answered: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>success
</code></pre>

</div>



So the workflow does not go into <code>repair_js</code> or <code>run_js_retry</code>. Happy path.

<h3>
 
 
 5. <code>answer</code>: extract the final user message
</h3>

The final task, <code>answer</code>, is intentionally boring:

<ul>
<li>System prompt: “You are a purely extractive post-processor. Do NOT invent content. Just surface the best existing <code>answer</code> field.”
</li>
<li>
It gets:

<ul>
<li>First run (<code>run_js</code> result).</li>
<li>Second run (<code>run_js_retry</code>), if any.</li>
</ul>


</li>

<li>

Selection rule:

<ul>
<li>Take the last non-empty <code>answer</code> you see.</li>
<li>Output it verbatim.</li>
</ul>


</li>

</ul>

In our case it found: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>Task titled "delectus aut autem" is incomplete.
</code></pre>

</div>



And that’s exactly what Contenox replied in chat.




<h2>
 
 
 Why this is interesting (to me, at least)
</h2>

What I originally set out to build:

<blockquote>
A runtime for deterministic, observable GenAI workflows. 
Tasks, transitions, hooks – all explicit and replayable.
</blockquote>

What I accidentally stumbled into:

<blockquote>
A multi-model, self-orchestrating agent pattern, 
where LLMs write code that uses tools, and the runtime executes and evaluates that code.
</blockquote>

The pattern looks like this:

<ol>
<li>
Planner LLM (<code>generate_js</code>)</li>
</ol>

<ul>
<li>Reads user intent + history.</li>
<li>Emits JavaScript that calls <code>httpFetch</code>, <code>executeTask</code>, <code>executeTaskChain</code>, hooks, etc.</li>
</ul>

<ol>
<li>
Execution environment (<code>run_js</code> in Goja)</li>
</ol>

<ul>
<li>Deterministic execution of that JS.</li>
<li>Full logs of every HTTP call, every inner LLM call, every step.</li>
</ul>

<ol>
<li>
Controller LLM (<code>eval_js_result</code>)</li>
</ol>

<ul>
<li>Looks at the sandbox result.</li>
<li>Decides: is this good enough? Retry? Repair?</li>
</ul>

<ol>
<li>
Repair LLM (<code>repair_js</code>, if needed)</li>
</ol>

<ul>
<li>Gets the previous code + error output.</li>
<li>Writes a fixed version of the JS.</li>
</ul>

<ol>
<li>
Answer LLM (<code>answer</code>)</li>
</ol>

<ul>
<li>Doesn’t “reason” at all.</li>
<li>Just extracts the final <code>answer</code> text safely.</li>
</ul>

All of that is expressed as an explicit state machine in Contenox.

No hidden loops, no undocumented retries, no magic glue code inside some SDK. It’s all visible in the workflow graph and trace.




To me, that’s the exciting part:

<blockquote>
You don’t have to choose between “boring deterministic workflows” and “fancy agents”. 
You can build the agent on top of deterministic workflows. 
And everything stays **self-hosted, inspectable, and auditable if you want.
</blockquote>

يتناول المقال تجربة تم فيها السماح لنموذج ذكاء اصطناعي بكتابة كود جافا سكريبت داخل بيئة مستقلة تُدعى كونتينوكس. يتأمل الكاتب في مفهوم يتعلق باستخدام الأدوات في الذكاء الاصطناعي، مقترحًا أن النماذج يجب أن تولد كودًا لاستخدام الأدوات بدلاً من إجراء مكالمات مباشرة. تم اختبار هذا النهج من خلال تنفيذ كود جافا سكريبت الذي تم إنشاؤه داخل بيئة كونتينوكس، بهدف تحسين كفاءة سير العمل في الذكاء الاصطناعي.

El artículo discute un experimento en el que se permitió a un modelo de IA escribir código JavaScript dentro de un entorno autónomo llamado Contenox. El autor reflexiona sobre un concepto relacionado con el uso de herramientas en IA, sugiriendo que los modelos deberían generar código para utilizar herramientas en lugar de realizar llamadas directas. Este enfoque se probó ejecutando el JavaScript generado dentro del entorno Contenox, con el objetivo de mejorar la eficiencia de los flujos de trabajo de IA.

L'article traite d'une expérience où un modèle d'IA a été autorisé à écrire du code JavaScript au sein d'un environnement autonome appelé Contenox. L'auteur réfléchit à un concept concernant l'utilisation des outils en IA, suggérant que les modèles devraient générer du code pour utiliser des outils plutôt que d'effectuer des appels directs. Cette approche a été testée en exécutant le JavaScript généré dans l'environnement Contenox, visant à améliorer l'efficacité des flux de travail en IA.

The article discusses an experiment where an AI model was allowed to write JavaScript code within a self-hosted runtime called Contenox. The author reflects on a concept regarding tool usage in AI, suggesting that models should generate code to utilize tools instead of direct calls. This approach was tested by executing the generated JavaScript within the Contenox environment, aiming to enhance the efficiency of AI workflows.

I Let an LLM Write JavaScript Inside My AI Runtime. Here’s What Happened

Caterpillar Inc. was always an unlikely winner in the artificial intelligence craze. It makes the bulk of its money selling the equipment like yellow earth movers that has made it a stalwart of American industry.

تظهر شركة كاتربيلر كجهة غير متوقعة في قطاع الذكاء الاصطناعي، حيث تُعرف أساسًا بتصنيع الآلات الثقيلة مثل آلات الحفر. تاريخيًا، كانت الشركة تركز على المعدات الصناعية التقليدية، مما يجعلها أقل توافقًا مع الاتجاهات التكنولوجية المدفوعة بالذكاء الاصطناعي التي اجتذبت العديد من القطاعات الأخرى. على الرغم من الاهتمام المتزايد بالذكاء الاصطناعي، لا يزال النشاط الرئيسي لشركة كاتربيلر متجذرًا في الآلات المادية، مما قد يحد من جاذبيتها في مشهد تكنولوجي سريع التطور.

Caterpillar Inc. se ha presentado como un jugador improbable en el sector de la inteligencia artificial, siendo principalmente conocida por la fabricación de maquinaria pesada como las excavadoras. La compañía ha estado históricamente enfocada en equipos industriales tradicionales, lo que la hace menos alineada con las tendencias tecnológicas impulsadas por la IA que han cautivado a muchos otros sectores. A pesar del creciente interés en la IA, el negocio principal de Caterpillar sigue anclado en la maquinaria física, lo que podría limitar su atractivo en un paisaje tecnológico en rápida evolu…

Caterpillar Inc. se présente comme un acteur improbable dans le secteur de l'intelligence artificielle, étant principalement connu pour sa fabrication de machines lourdes telles que les pelles mécaniques. L'entreprise s'est historiquement concentrée sur l'équipement industriel traditionnel, ce qui la rend moins alignée avec les tendances technologiques axées sur l'IA qui ont captivé de nombreux autres secteurs. Malgré l'intérêt croissant pour l'IA, le cœur de métier de Caterpillar reste ancré dans la machinerie physique, ce qui pourrait limiter son attrait dans un paysage technologique en évol…

Caterpillar Inc. has emerged as an unlikely player in the artificial intelligence sector, primarily known for its manufacturing of heavy machinery such as earth movers. The company has historically focused on traditional industrial equipment, making it less aligned with the AI-driven technology trends that have captivated many other sectors. Despite the growing interest in AI, Caterpillar's core business remains rooted in physical machinery, which may limit its appeal in the rapidly evolving tech landscape.

Caterpillar’s Lone Bear Says Machinery Maker Is No AI Darling

<a href="https://www.techspot.com/news/110306-microsoft-explains-how-windows-11-become-agentic-os.html" target="_blank"><img src="https://www.techspot.com/images2/news/ts3_thumbs/2025/11/2025-11-18-ts3_thumbs-d01.jpg" width="800" height="560" style="padding: 15px 0" title="Microsoft explains how Windows 11 will become an agentic OS whether you like it or not" /></a> Windows president Pavan Davuluri recently described the future of Windows as an agentic operating system, where AI bots and large language models handle the user's commands on files and computing tasks. Critics mostly greeted the idea with scorn, cursing, and frustration over the "bug-ridden slop pile" the OS currently is.... <a href="https://www.techspot.com/news/110306-microsoft-explains-how-windows-11-become-agentic-os.html">Read Entire Article</a>

وصف رئيس شركة مايكروسوفت، بavan دافولوري، خطط ويندوز 11 للتطور إلى ما يسميه 'نظام تشغيل وكيل'. ستتضمن هذه التحويلة دمج روبوتات الذكاء الاصطناعي ونماذج اللغة الكبيرة لإدارة أوامر المستخدم ومهام الحوسبة. ومع ذلك، تم استقبال هذا الإعلان بسخرية وانتقادات من المستخدمين الذين يشعرون بالإحباط من الحالة الحالية لنظام التشغيل، الذي يصفونه بأنه مليء بالأخطاء.

El presidente de Microsoft, Pavan Davuluri, ha descrito los planes para que Windows 11 evolucione hacia lo que él llama un 'sistema operativo agente'. Esta transformación implicará la integración de bots de IA y modelos de lenguaje para gestionar los comandos y tareas informáticas del usuario. Sin embargo, el anuncio ha sido recibido con escepticismo y críticas por parte de los usuarios, que están frustrados con el estado actual del sistema operativo, que describen como plagado de errores.

Le président de Microsoft, Pavan Davuluri, a décrit les projets de Windows 11 pour évoluer vers ce qu'il appelle un 'système d'exploitation agentique'. Cette transformation impliquera l'intégration de bots IA et de modèles de langage pour gérer les commandes des utilisateurs et les tâches informatiques. Cependant, cette annonce a été accueillie avec scepticisme et critiques de la part des utilisateurs frustrés par l'état actuel du système d'exploitation, qu'ils décrivent comme rempli de bogues.

Microsoft's president, Pavan Davuluri, has outlined plans for Windows 11 to evolve into what he describes as an 'agentic operating system.' This transformation will involve the integration of AI bots and large language models to manage user commands and computing tasks. However, the announcement has been met with skepticism and criticism from users who are frustrated with the current state of the operating system, which they describe as plagued by bugs.

Microsoft explains how Windows 11 will become an agentic OS whether you like it or not

Although Black Friday is still two weeks away, you can find great Nintendo Switch and Switch 2 deals now. I've collected the best from Walmart, Best Buy, and more.

مع اقتراب يوم الجمعة السوداء بعد أسبوعين، تتوفر بالفعل عروض مبكرة على أجهزة نينتندو سويتش وسويتش 2. تقدم متاجر كبيرة مثل وول مارت وبيست باي أكثر من 20 عرضًا، مما يوفر للمستهلكين فرصة لتوفير المال على منتجات الألعاب الشهيرة قبل موسم التسوق للعطلات.

A medida que se acerca el Black Friday en dos semanas, ya están disponibles ofertas anticipadas en las consolas Nintendo Switch y Switch 2. Grandes minoristas como Walmart y Best Buy están ofreciendo más de 20 ventas, brindando a los consumidores la oportunidad de ahorrar en productos de videojuegos populares antes de la locura de compras navideñas.

À l'approche de Black Friday dans deux semaines, des offres anticipées sur les consoles Nintendo Switch et Switch 2 sont déjà disponibles. Des détaillants majeurs comme Walmart et Best Buy proposent plus de 20 ventes, offrant aux consommateurs l'occasion d'économiser sur des produits de jeu populaires avant la ruée des achats de vacances.

As Black Friday approaches in two weeks, early deals on Nintendo Switch and Switch 2 consoles are already available. Major retailers like Walmart and Best Buy are offering over 20 sales, providing consumers with an opportunity to save on popular gaming products ahead of the holiday shopping rush.

Best early Black Friday Nintendo Switch deals 2025: 20+ sales out early

<A HREF="https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9zb3VyY2VzLm5ld3MvcC9kZW1pcy1oYXNzaWJhcy1vbi1nZW1pbmktMy13b3JsZD91dG1fY2FtcGFpZ249ZW1haWwtcG9zdCZyPTFyODVmJnRva2VuPWV5SjFjMlZ5WDJsa0lqb3lPVFE1T0RreExDSndiM04wWDJsa0lqb3hOemt5TlRnNE9UZ3NJbWxoZENJNk1UYzJNelE1TmpVME5pd2laWGh3SWpveE56WTJNRGc0TlRRMkxDSnBjM01pT2lKd2RXSXRNelV5TlRjNE1DSXNJbk4xWWlJNkluQnZjM1F0Y21WaFkzUnBiMjRpZlEucDdlcWFuMFM3WDNXTXQ1OUY4Y2RYZG1tb1VBRGJNMlBGZDM1c3ZZWUc2YyIsInAiOjE3OTI1ODg5OCwicyI6MzUyNTc4MCwiZiI6ZmFsc2UsInUiOjI5NDk4OTEsImlhdCI6MTc2MzQ5NjU0NiwiZXhwIjoyMDc5MDcyNTQ2LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.MstA6dhKs3CLxJgLSEpyVK_D4Oz9SmcjeXKNnqEDzIg?"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251118/i47.jpg"></A>
<A HREF="http://www.techmeme.com/251118/p47#a251118p47" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Alex Heath / <A HREF="https://sources.news/">Sources</A>: 
<A HREF="https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9zb3VyY2VzLm5ld3MvcC9kZW1pcy1oYXNzaWJhcy1vbi1nZW1pbmktMy13b3JsZD91dG1fY2FtcGFpZ249ZW1haWwtcG9zdCZyPTFyODVmJnRva2VuPWV5SjFjMlZ5WDJsa0lqb3lPVFE1T0RreExDSndiM04wWDJsa0lqb3hOemt5TlRnNE9UZ3NJbWxoZENJNk1UYzJNelE1TmpVME5pd2laWGh3SWpveE56WTJNRGc0TlRRMkxDSnBjM01pT2lKd2RXSXRNelV5TlRjNE1DSXNJbk4xWWlJNkluQnZjM1F0Y21WaFkzUnBiMjRpZlEucDdlcWFuMFM3WDNXTXQ1OUY4Y2RYZG1tb1VBRGJNMlBGZDM1c3ZZWUc2YyIsInAiOjE3OTI1ODg5OCwicyI6MzUyNTc4MCwiZiI6ZmFsc2UsInUiOjI5NDk4OTEsImlhdCI6MTc2MzQ5NjU0NiwiZXhwIjoyMDc5MDcyNTQ2LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.MstA6dhKs3CLxJgLSEpyVK_D4Oz9SmcjeXKNnqEDzIg?">Q&amp;A with Demis Hassabis on Gemini 3, spending most of his research time on world models, fitting the entire Google Search index into Gemini, AI bubble, and more</A>&nbsp; &mdash;&nbsp; Demis Hassabis was noticeably relaxed when he joined our virtual call from London.&nbsp; &mdash;&nbsp; It was the day before the release of Gemini 3 &hellip;

ناقش ديميس هاسابيس، المؤسس المشارك لشركة ديب مايند، تقدم نموذج جيميني 3، أحدث نموذج للذكاء الاصطناعي من جوجل، مشددًا على قدراته في نمذجة العالم ودمج كامل فهرس بحث جوجل في النظام. تناول المخاوف بشأن فقاعة الذكاء الاصطناعي وأبرز إمكانيات النموذج في تحسين تفاعلات المستخدمين وتوفير معلومات أكثر دقة. تعكس أفكار هاسابيس التزامًا بدفع حدود تكنولوجيا الذكاء الاصطناعي ودمجها في التطبيقات اليومية.

Demis Hassabis, cofundador de DeepMind, discutió los avances de Gemini 3, el último modelo de IA de Google, enfatizando sus capacidades en modelos del mundo y la integración de todo el índice de búsqueda de Google en el sistema. Abordó las preocupaciones sobre la burbuja de la IA y destacó el potencial del modelo para mejorar las interacciones con los usuarios y proporcionar información más precisa. Las ideas de Hassabis reflejan un compromiso por llevar la tecnología de IA a nuevos límites y su integración en aplicaciones cotidianas.

Demis Hassabis, co-fondateur de DeepMind, a discuté des avancées de Gemini 3, le dernier modèle d'IA de Google, en mettant l'accent sur ses capacités en modélisation du monde et l'intégration de l'ensemble de l'index de recherche de Google dans le système. Il a abordé les préoccupations concernant la bulle de l'IA et a souligné le potentiel du modèle à améliorer les interactions avec les utilisateurs et à fournir des informations plus précises. Les réflexions de Hassabis reflètent un engagement à repousser les limites de la technologie IA et son intégration dans les applications quotidiennes.

Demis Hassabis, co-founder of DeepMind, discussed the advancements of Gemini 3, Google's latest AI model, emphasizing its capabilities in world modeling and fitting the entire Google Search index into the system. He addressed concerns about the AI bubble and highlighted the model's potential to enhance user interactions and provide more accurate information. Hassabis's insights reflect a commitment to pushing the boundaries of AI technology and its integration into everyday applications.

Q&A with Demis Hassabis on Gemini 3, spending most of his research time on world models, fitting the entire Google Search index into Gemini, AI bubble, and more (Alex Heath/Sources)

arXiv:2511.06854v2 Announce Type: replace-cross 
Abstract: Irregularly sampled time series (ISTS), characterized by non-uniform time intervals with natural missingness, are prevalent in real-world applications. Existing approaches for ISTS modeling primarily rely on observed values to impute unobserved ones or infer latent dynamics. However, these methods overlook a critical source of learning signal: the reconstruction error inherently produced during model training. Such error implicitly reflects how well a model captures the underlying data structure and can serve as an informative proxy for unobserved values. To exploit this insight, we propose iTimER, a simple yet effective self-supervised pre-training framework for ISTS representation learning. iTimER models the distribution of reconstruction errors over observed values and generates pseudo-observations for unobserved timestamps through a mixup strategy between sampled errors and the last available observations. This transforms unobserved timestamps into noise-aware training targets, enabling meaningful reconstruction signals. A Wasserstein metric aligns reconstruction error distributions between observed and pseudo-observed regions, while a contrastive learning objective enhances the discriminability of learned representations. Extensive experiments on classification, interpolation, and forecasting tasks demonstrate that iTimER consistently outperforms state-of-the-art methods under the ISTS setting.

تُعتبر السلاسل الزمنية المأخوذة بشكل غير منتظم (ISTS) شائعة في التطبيقات الواقعية، حيث تتميز بفترات زمنية غير متساوية وغيابات طبيعية. تعتمد الأساليب الحالية لنمذجة ISTS بشكل أساسي على القيم المرصودة لاستنتاج القيم غير المرصودة، متجاهلة إشارة التعلم الناتجة عن خطأ إعادة البناء الذي يتم إنتاجه أثناء تدريب النموذج. يقترح الإطار المقترح iTimER استغلال هذا الخطأ في إعادة البناء لتحسين تعلم التمثيل من خلال توليد ملاحظات زائفة للعلامات الزمنية غير المرصودة، مما يحسن نمذجة ISTS.

Las series temporales muestreadas de manera irregular (ISTS) son comunes en aplicaciones del mundo real, caracterizadas por intervalos de tiempo no uniformes y ausencias naturales. Los métodos existentes para la modelización de ISTS suelen depender de valores observados para inferir los no observados, ignorando la señal de aprendizaje proveniente del error de reconstrucción generado durante el entrenamiento del modelo. El marco propuesto iTimER aprovecha este error de reconstrucción para mejorar el aprendizaje de la representación generando pseudo-observaciones para marcas de tiempo no observa…

Les séries temporelles échantillonnées de manière irrégulière (ISTS) sont courantes dans les applications réelles, caractérisées par des intervalles de temps non uniformes et des absences naturelles. Les méthodes traditionnelles de modélisation des ISTS s'appuient souvent sur des valeurs observées pour inférer celles non observées, négligeant le signal d'apprentissage provenant de l'erreur de reconstruction produite lors de l'entraînement du modèle. Le cadre proposé iTimER exploite cette erreur de reconstruction pour améliorer l'apprentissage de la représentation en générant des pseudo-observa…

Irregularly sampled time series (ISTS) are common in real-world applications, characterized by non-uniform time intervals and natural missingness. Traditional ISTS modeling methods often rely on observed values to infer unobserved ones, neglecting the learning signal from reconstruction error produced during model training. The proposed iTimER framework leverages this reconstruction error to enhance representation learning by generating pseudo-observations for unobserved timestamps, thus improving the modeling of ISTS.

Beyond Observations: Reconstruction Error-Guided Irregularly Sampled Time Series Representation Learning

arXiv:2503.09887v2 Announce Type: replace-cross 
Abstract: We develop a novel semigroup stability analysis based on Lyapunov techniques and contraction coefficients to prove exponential convergence of Sinkhorn equations on weighted Banach spaces. This operator-theoretic framework yields exponential decays of Sinkhorn iterates towards Schr\"odinger bridges with respect to general classes of $\phi$-divergences and Kantorovich-type criteria, including the relative entropy, squared Hellinger integrals, $\alpha$-divergences as well as weighted total variation norms and Wasserstein distances. To the best of our knowledge, these contraction inequalities are the first results of this type in the literature on entropic transport and the Sinkhorn algorithm.
  We also provide Lyapunov contractions principles under minimal regularity conditions that allow to provide quantitative exponential stability estimates for a large class of Sinkhorn semigroups. We apply this novel framework in a variety of situations, ranging from polynomial growth potentials and heavy tailed marginals on general normed spaces to more sophisticated boundary state space models, including semi-circle transitions, Beta, Weibull, exponential marginals as well as semi-compact models. Last but not least, our approach also allows to consider statistical finite mixture of the above models, including kernel-type density estimators of complex data distributions arising in generative modeling.

On the contraction properties of Sinkhorn semigroups

arXiv:2511.13049v1 Announce Type: cross 
Abstract: We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and \textit{share a common subspace}. We assume that a large amount $M$ of \textit{unlabeled} data drawn from the sampling distribution $P$ is available, together with a small amount $N$ of labeled data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to `implicit feedback' (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the `explicit feedback', consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms scaling as $\widetilde{O}\left(\sqrt{\frac{nd}{M}}\right)$ and $\widetilde{O}\left(\sqrt{\frac{dr}{N}}\right)$ respectively, where $d$ is the rank of $P$ and $r$ is the rank of $M$. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $\ground$ respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.

يتناول المقال مشكلة إكمال المصفوفات حيث تكون كل من مصفوفة الحقيقة الأساسية وتوزيع العينة غير المعروف مصفوفات منخفضة الرتبة تشترك في فضاء فرعي مشترك. يُفترض أن هناك كمية كبيرة من البيانات غير المعلّمة المتاحة، جنبًا إلى جنب مع كمية صغيرة من البيانات المعلّمة، مستلهمة من أنظمة التوصية. تستفيد الدراسة من نتائج استرداد الفضاءات الفرعية منخفضة الرتبة وحدود التعميم الكلاسيكية لإظهار حدود الخطأ، مما يساهم في التقدم في تقنيات التعلم شبه المراقب.

El artículo aborda un problema de completación de matrices donde tanto la matriz de verdad subyacente como la distribución de muestreo desconocida son matrices de bajo rango que comparten un subespacio común. Se asume que hay una gran cantidad de datos no etiquetados disponibles, junto con una pequeña cantidad de datos etiquetados, inspirado en sistemas de recomendación. El estudio aprovecha resultados de recuperación de subespacios de bajo rango y límites de generalización clásicos para demostrar límites de error, contribuyendo a los avances en técnicas de aprendizaje semi-supervisado.

L'article traite d'un problème de complétion de matrice où à la fois la matrice de vérité de base et la distribution d'échantillonnage inconnue sont des matrices de faible rang partageant un sous-espace commun. Il suppose qu'une grande quantité de données non étiquetées provenant de la distribution d'échantillonnage est disponible, ainsi qu'une petite quantité de données étiquetées, inspirée par les systèmes de recommandation. L'étude utilise des résultats de récupération de sous-espace de faible rang et des bornes de généralisation classiques pour démontrer des bornes d'erreur, contribuant au…

The article discusses a matrix completion problem where both the ground truth matrix and the unknown sampling distribution are low-rank matrices sharing a common subspace. It assumes a large amount of unlabeled data from the sampling distribution is available alongside a small amount of labeled data, inspired by recommender systems. The study leverages low-rank subspace recovery results and classic generalization bounds for matrix completion models to demonstrate error bounds, contributing to advancements in semi-supervised learning techniques.

Optimal Self-Consistency for Efficient Reasoning with Large Language Models

Was this article worth reading? Share it