arXiv:2511.07700v1 Announce Type: new 
Abstract: Artificial Intelligence (AI) models have demonstrated expert-level performance in melanoma detection, yet their clinical adoption is hindered by performance disparities across demographic subgroups such as gender, race, and age. Previous efforts to benchmark the performance of AI models have primarily focused on assessing model performance using group fairness metrics that rely on the Area Under the Receiver Operating Characteristic curve (AUROC), which does not provide insights into a model's ability to provide accurate estimates. In line with clinical assessments, this paper addresses this gap by incorporating calibration as a complementary benchmarking metric to AUROC-based fairness metrics. Calibration evaluates the alignment between predicted probabilities and observed event rates, offering deeper insights into subgroup biases. We assess the performance of the leading skin cancer detection algorithm of the ISIC 2020 Challenge on the ISIC 2020 Challenge dataset and the PROVE-AI dataset, and compare it with the second and third place models, focusing on subgroups defined by sex, race (Fitzpatrick Skin Tone), and age. Our findings reveal that while existing models enhance discriminative accuracy, they often over-diagnose risk and exhibit calibration issues when applied to new datasets. This study underscores the necessity for comprehensive model auditing strategies and extensive metadata collection to achieve equitable AI-driven healthcare solutions. All code is publicly available at https://github.com/bdominique/testing_strong_calibration.

تسلط دراسة حديثة الضوء على أهمية المعايرة في تقييم العدالة الخوارزمية للكشف عن سرطان الجلد. على الرغم من أن نماذج الذكاء الاصطناعي تظهر أداءً على مستوى الخبراء في الكشف عن الميلانوما، إلا أن هناك تفاوتات في الأداء عبر المجموعات السكانية. تقترح هذه الدراسة دمج المعايرة جنبًا إلى جنب مع المقاييس التقليدية لفهم التحيزات بين المجموعات الفرعية بشكل أفضل، مما يبرز الحاجة إلى استراتيجيات تدقيق شاملة للنماذج وجمع بيانات وصفية لتحسين العدالة في تطبيقات الذكاء الاصطناعي.

Un estudio reciente destaca la importancia de la calibración en la evaluación de la equidad algorítmica para la detección del cáncer de piel. Aunque los modelos de IA muestran un rendimiento de nivel experto en la detección de melanoma, existen disparidades en los grupos demográficos. Esta investigación propone incorporar la calibración junto con métricas tradicionales para comprender mejor los sesgos entre subgrupos, enfatizando la necesidad de auditorías exhaustivas de modelos y recolección de metadatos para mejorar la equidad en las aplicaciones de IA.

Une étude récente souligne l'importance de la calibration dans l'évaluation de l'équité algorithmique pour la détection du cancer de la peau. Bien que les modèles d'IA montrent des performances de niveau expert dans la détection du mélanome, des disparités existent entre les groupes démographiques. Cette recherche propose d'incorporer la calibration aux métriques traditionnelles pour mieux comprendre les biais entre sous-groupes, en insistant sur la nécessité d'audits complets des modèles et de collecte de métadonnées pour améliorer l'équité dans les applications d'IA.

A recent study highlights the importance of calibration in benchmarking algorithmic fairness for skin cancer detection. While AI models show expert-level performance in melanoma detection, disparities exist across demographic groups. This research proposes incorporating calibration alongside traditional metrics to better understand subgroup biases, emphasizing the need for comprehensive model auditing and metadata collection to enhance fairness in AI applications.

On the Role of Calibration in Benchmarking Algorithmic Fairness for Skin Cancer Detection

حقق Google Gemini 3 إنجازًا كبيرًا من خلال تجاوز جميع معايير الذكاء الاصطناعي الحالية، بما في ذلك الأكثر تحديًا. يبرز هذا الإنجاز التقدم الذي حققته فريق الذكاء الاصطناعي في Google ويضع Gemini 3 كمتنافس رائد في مجال الذكاء الاصطناعي. يتم الاحتفال بهذا النجاح داخل مجتمع التكنولوجيا، مما يعكس التطور المستمر لتقنيات الذكاء الاصطناعي وقدراتها.

Google Gemini 3 ha logrado un hito significativo al superar todos los benchmarks de IA existentes, incluidos los más desafiantes. Este logro resalta los avances realizados por el equipo de IA de Google y posiciona a Gemini 3 como un competidor líder en el panorama de la inteligencia artificial. El éxito es celebrado dentro de la comunidad tecnológica, reflejando la evolución continua de las tecnologías de IA y sus capacidades.

Google Gemini 3 a atteint un jalon significatif en surpassant tous les benchmarks d'IA existants, y compris les plus difficiles. Cet accomplissement met en évidence les avancées réalisées par l'équipe d'IA de Google et positionne Gemini 3 comme un concurrent de premier plan dans le paysage de l'intelligence artificielle. Le succès est célébré au sein de la communauté technologique, reflétant l'évolution continue des technologies d'IA et de leurs capacités.

Google Gemini 3 has achieved a significant milestone by surpassing all existing AI benchmarks, including the most challenging ones. This accomplishment highlights the advancements made by Google's AI team and positions Gemini 3 as a leading contender in the artificial intelligence landscape. The success is celebrated within the tech community, reflecting the ongoing evolution of AI technologies and their capabilities.

Google Gemini 3 Just Killed Every AI Benchmark, Including the Hardest of All

Two weeks ago I read a line about tool use with Claude that stuck in my head. Paraphrased:

<blockquote>
Direct tool calls don’t really scale. 
Have the model write code that uses tools, and execute that code instead.
</blockquote>

At the same time, I was knee-deep in wiring a JavaScript execution environment into Contenox, my self-hosted runtime for deterministic, chat-native AI workflows.

So of course the thought was:

<blockquote>
What if I just let the model write the JavaScript and run it inside the runtime? 😅
</blockquote>

This post is about what happened when I tried exactly that.




<h2>
 
 
 What is Contenox?
</h2>

Very short version:

<blockquote>
Contenox is a self-hostable runtime for sovereign GenAI applications. 
It models AI behavior as explicit state machines, not opaque prompt chains.
</blockquote>

Some key properties:

Runtime, not a library, Explicit state machines, Chat-native interface, Vendor-agnostic &amp; self-hosted, written in go with lots of passion and zero tolerance for shortcuts




<h2>
 
 
 The experiment: ask it to fetch and summarize a TODO
</h2>

Once the JS execution hook was in place (a Goja VM with some globals exposed), I wired up a new state machine: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>mux_input → moderate → generate_js → run_js → eval_js_result → (repair_js?) → answer
</code></pre>

</div>



Then I opened the Contenox chat UI and typed:

<blockquote>
Me: “Hey Contenox, fetch the TODO item from jsonplaceholder and summarize it.”
</blockquote>

The first attempt didn’t go as planned.

Instead of a neat summary, I got:

<blockquote>
Contenox: 
“I'm sorry, but it seems there was a problem fetching the TODO item from the provided URL due to an Internal Server Error (status code: 500). Unfortunately, I couldn't retrieve any data at this time. My apologies for the inconvenience!”
</blockquote>

I checked manually – Cloudflare was having a moment, JSONPlaceholder was behind it, and the request really was failing. Okay, fair. Long coffee break.

A bit later, I went back to the same chat and typed:

<blockquote>
Me: “okay let's try again now.”
</blockquote>

This time the reply was:

<blockquote>
Contenox: 
<code>Task titled "delectus aut autem" is incomplete. Nothing to summarize here.</code>
</blockquote>

Wait. It worked? Just like that?




<h2>
 
 
 Under the hood: LLM → JavaScript → tools → LLM
</h2>

Looking at the execution trace made it clear what happened.

<h3>
 
 
 1. Input is normalized and moderated
</h3>

The workflow starts with:

<ol>
<li><code>mux_input</code></li>
</ol>

<ul>
<li>
Parses the incoming chat history into a normalized <code>messages</code> array.

<ol>
<li><code>moderate</code></li>
</ol>
</li>
<li>Uses a small model to classify the input as safe/unsafe.</li>
<li>Output: <code>0</code> → safe → continue.</li>
</ul>

So far this is standard workflow stuff.

<h3>
 
 
 2. <code>generate_js</code>: the LLM writes JavaScript
</h3>

Next, the <code>generate_js</code> task runs with a system instruction like:

<ul>
<li>“You are an expert JavaScript agent for the Contenox runtime…”</li>
<li>“You must output JSON: { "code": "" }”</li>
<li>“You can use httpFetch, executeTask, executeTaskChain, …”</li>
<li>“Script must be synchronous, no async/await, and must return a JSON-serializable object.”</li>
</ul>

Given the chat history, the model produced something like this (simplified): 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "code": "var result;
var answer;
var meta = { steps: [] };

var todoUrl = \"https://jsonplaceholder.typicode.com/todos/1\";

// Step 1: Fetch the TODO item
meta.steps.push(\"Fetching TODO item from \" + todoUrl);
var fetchResponse = httpFetch(todoUrl);

if (!fetchResponse.ok) {
 answer = \"I couldn't access that item right now because there was some trouble with getting it. Please check the URL or try again later.\";
 meta.error = {
 message: \"HTTP fetch failed\",
 url: todoUrl,
 status: fetchResponse.status,
 statusText: fetchResponse.statusText,
 body: fetchResponse.body,
 error: fetchResponse.error
 };
 result = { answer: answer, meta: meta };
} else {
 var todoItem;
 try {
 todoItem = JSON.parse(fetchResponse.body);
 meta.raw = todoItem;
 } catch (e) {
 answer = \"I fetched the item, but I couldn't understand its format. It doesn't seem to be valid JSON.\";
 meta.error = {
 message: \"JSON parse error\",
 rawBody: fetchResponse.body,
 parseError: e.toString()
 };
 result = { answer: answer, meta: meta };
 }

 if (todoItem) {
 // Step 2: Summarize the TODO item using an LLM
 meta.steps.push(\"Summarizing TODO item with LLM\");
 var prompt = \"Summarize the following TODO item in a concise sentence: \" + JSON.stringify(todoItem);
 var summarizeTask = executeTask(prompt, \"gpt-4o-mini\", \"openai\");

 if (!summarizeTask.success) {
 answer = \"While I managed to fetch your TODO details, the summary tool had a hiccup and couldn't provide me with a concise sentence right now. We seem to be experiencing some temporary issues.\";
 meta.error = {
 message: \"LLM task execution failed\",
 task_id: summarizeTask.task_id,
 error: summarizeTask.error
 };
 result = { answer: answer, meta: meta };
 } else {
 answer = summarizeTask.response;
 meta.llmResponse = summarizeTask.response;
 result = { answer: answer, meta: meta };
 }
 }
}

result;"
}
</code></pre>

</div>



A few important details here:

<ul>
<li>
It encodes error handling:

<ul>
<li>If HTTP fails → user-friendly <code>answer</code>, raw error in <code>meta.error</code>.</li>
<li>If JSON parse fails → similar pattern.</li>
<li>If the inner LLM call fails → fallback message.</li>
</ul>


</li>

<li>It delegates the actual summarization to another model via <code>executeTask</code>.</li>

<li>It returns a structured <code>result</code> with both <code>answer</code> and <code>meta</code>.</li>

</ul>

This is not the model “calling tools” directly. It’s the model writing a program that calls tools.

<h3>
 
 
 3. <code>run_js</code>: execute the code in a sandbox
</h3>

The next task is <code>run_js</code>, which is just a Contenox <code>hook</code> that calls the JS sandbox: 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "name": "js_sandbox",
 "tool_name": "execute_js",
 "args": {
 "code": "{{.generate_js.code}}"
 }
}
</code></pre>

</div>



Inside the trace you can see:

<ul>
<li>An <code>httpFetch</code> log for the JSONPlaceholder URL.</li>
<li>A response with <code>status: 200 OK</code> when things finally worked.</li>
<li>
An <code>executeTask</code> log with the summarization prompt:

<ul>
<li><code>Summarize the following TODO item in a concise sentence: {"userId":1,"id":1,"title":"delectus aut autem","completed":false}</code></li>
</ul>


</li>

</ul>

The sandbox result looked roughly like: 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "ok": true,
 "result": {
 "answer": "Task titled \"delectus aut autem\" is incomplete.",
 "meta": {
 "llmResponse": "Task titled \"delectus aut autem\" is incomplete.",
 "raw": {
 "userId": 1,
 "id": 1,
 "title": "delectus aut autem",
 "completed": false
 },
 "steps": [
 "Fetching TODO item from https://jsonplaceholder.typicode.com/todos/1",
 "Summarizing TODO item with LLM"
 ]
 }
 },
 "logs": [ ... ],
 "code": "var result; ..."
}
</code></pre>

</div>



<h3>
 
 
 4. <code>eval_js_result</code>: success or retry?
</h3>

Now comes the evaluator:

<ul>
<li>It receives a description of the JS sandbox output.</li>
<li>
The system prompt is very strict:

<ul>
<li>If <code>ok</code> is true and there is a non-empty <code>result.answer</code> → respond with <code>success</code>.</li>
<li>Otherwise → respond with <code>retry</code>.</li>
</ul>


</li>

</ul>

On the successful run, it answered: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>success
</code></pre>

</div>



So the workflow does not go into <code>repair_js</code> or <code>run_js_retry</code>. Happy path.

<h3>
 
 
 5. <code>answer</code>: extract the final user message
</h3>

The final task, <code>answer</code>, is intentionally boring:

<ul>
<li>System prompt: “You are a purely extractive post-processor. Do NOT invent content. Just surface the best existing <code>answer</code> field.”
</li>
<li>
It gets:

<ul>
<li>First run (<code>run_js</code> result).</li>
<li>Second run (<code>run_js_retry</code>), if any.</li>
</ul>


</li>

<li>

Selection rule:

<ul>
<li>Take the last non-empty <code>answer</code> you see.</li>
<li>Output it verbatim.</li>
</ul>


</li>

</ul>

In our case it found: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>Task titled "delectus aut autem" is incomplete.
</code></pre>

</div>



And that’s exactly what Contenox replied in chat.




<h2>
 
 
 Why this is interesting (to me, at least)
</h2>

What I originally set out to build:

<blockquote>
A runtime for deterministic, observable GenAI workflows. 
Tasks, transitions, hooks – all explicit and replayable.
</blockquote>

What I accidentally stumbled into:

<blockquote>
A multi-model, self-orchestrating agent pattern, 
where LLMs write code that uses tools, and the runtime executes and evaluates that code.
</blockquote>

The pattern looks like this:

<ol>
<li>
Planner LLM (<code>generate_js</code>)</li>
</ol>

<ul>
<li>Reads user intent + history.</li>
<li>Emits JavaScript that calls <code>httpFetch</code>, <code>executeTask</code>, <code>executeTaskChain</code>, hooks, etc.</li>
</ul>

<ol>
<li>
Execution environment (<code>run_js</code> in Goja)</li>
</ol>

<ul>
<li>Deterministic execution of that JS.</li>
<li>Full logs of every HTTP call, every inner LLM call, every step.</li>
</ul>

<ol>
<li>
Controller LLM (<code>eval_js_result</code>)</li>
</ol>

<ul>
<li>Looks at the sandbox result.</li>
<li>Decides: is this good enough? Retry? Repair?</li>
</ul>

<ol>
<li>
Repair LLM (<code>repair_js</code>, if needed)</li>
</ol>

<ul>
<li>Gets the previous code + error output.</li>
<li>Writes a fixed version of the JS.</li>
</ul>

<ol>
<li>
Answer LLM (<code>answer</code>)</li>
</ol>

<ul>
<li>Doesn’t “reason” at all.</li>
<li>Just extracts the final <code>answer</code> text safely.</li>
</ul>

All of that is expressed as an explicit state machine in Contenox.

No hidden loops, no undocumented retries, no magic glue code inside some SDK. It’s all visible in the workflow graph and trace.




To me, that’s the exciting part:

<blockquote>
You don’t have to choose between “boring deterministic workflows” and “fancy agents”. 
You can build the agent on top of deterministic workflows. 
And everything stays **self-hosted, inspectable, and auditable if you want.
</blockquote>

يتناول المقال تجربة تم فيها السماح لنموذج ذكاء اصطناعي بكتابة كود جافا سكريبت داخل بيئة مستقلة تُدعى كونتينوكس. يتأمل الكاتب في مفهوم يتعلق باستخدام الأدوات في الذكاء الاصطناعي، مقترحًا أن النماذج يجب أن تولد كودًا لاستخدام الأدوات بدلاً من إجراء مكالمات مباشرة. تم اختبار هذا النهج من خلال تنفيذ كود جافا سكريبت الذي تم إنشاؤه داخل بيئة كونتينوكس، بهدف تحسين كفاءة سير العمل في الذكاء الاصطناعي.

El artículo discute un experimento en el que se permitió a un modelo de IA escribir código JavaScript dentro de un entorno autónomo llamado Contenox. El autor reflexiona sobre un concepto relacionado con el uso de herramientas en IA, sugiriendo que los modelos deberían generar código para utilizar herramientas en lugar de realizar llamadas directas. Este enfoque se probó ejecutando el JavaScript generado dentro del entorno Contenox, con el objetivo de mejorar la eficiencia de los flujos de trabajo de IA.

L'article traite d'une expérience où un modèle d'IA a été autorisé à écrire du code JavaScript au sein d'un environnement autonome appelé Contenox. L'auteur réfléchit à un concept concernant l'utilisation des outils en IA, suggérant que les modèles devraient générer du code pour utiliser des outils plutôt que d'effectuer des appels directs. Cette approche a été testée en exécutant le JavaScript généré dans l'environnement Contenox, visant à améliorer l'efficacité des flux de travail en IA.

The article discusses an experiment where an AI model was allowed to write JavaScript code within a self-hosted runtime called Contenox. The author reflects on a concept regarding tool usage in AI, suggesting that models should generate code to utilize tools instead of direct calls. This approach was tested by executing the generated JavaScript within the Contenox environment, aiming to enhance the efficiency of AI workflows.

I Let an LLM Write JavaScript Inside My AI Runtime. Here’s What Happened

Caterpillar Inc. was always an unlikely winner in the artificial intelligence craze. It makes the bulk of its money selling the equipment like yellow earth movers that has made it a stalwart of American industry.

تظهر شركة كاتربيلر كجهة غير متوقعة في قطاع الذكاء الاصطناعي، حيث تُعرف أساسًا بتصنيع الآلات الثقيلة مثل آلات الحفر. تاريخيًا، كانت الشركة تركز على المعدات الصناعية التقليدية، مما يجعلها أقل توافقًا مع الاتجاهات التكنولوجية المدفوعة بالذكاء الاصطناعي التي اجتذبت العديد من القطاعات الأخرى. على الرغم من الاهتمام المتزايد بالذكاء الاصطناعي، لا يزال النشاط الرئيسي لشركة كاتربيلر متجذرًا في الآلات المادية، مما قد يحد من جاذبيتها في مشهد تكنولوجي سريع التطور.

Caterpillar Inc. se ha presentado como un jugador improbable en el sector de la inteligencia artificial, siendo principalmente conocida por la fabricación de maquinaria pesada como las excavadoras. La compañía ha estado históricamente enfocada en equipos industriales tradicionales, lo que la hace menos alineada con las tendencias tecnológicas impulsadas por la IA que han cautivado a muchos otros sectores. A pesar del creciente interés en la IA, el negocio principal de Caterpillar sigue anclado en la maquinaria física, lo que podría limitar su atractivo en un paisaje tecnológico en rápida evolu…

Caterpillar Inc. se présente comme un acteur improbable dans le secteur de l'intelligence artificielle, étant principalement connu pour sa fabrication de machines lourdes telles que les pelles mécaniques. L'entreprise s'est historiquement concentrée sur l'équipement industriel traditionnel, ce qui la rend moins alignée avec les tendances technologiques axées sur l'IA qui ont captivé de nombreux autres secteurs. Malgré l'intérêt croissant pour l'IA, le cœur de métier de Caterpillar reste ancré dans la machinerie physique, ce qui pourrait limiter son attrait dans un paysage technologique en évol…

Caterpillar Inc. has emerged as an unlikely player in the artificial intelligence sector, primarily known for its manufacturing of heavy machinery such as earth movers. The company has historically focused on traditional industrial equipment, making it less aligned with the AI-driven technology trends that have captivated many other sectors. Despite the growing interest in AI, Caterpillar's core business remains rooted in physical machinery, which may limit its appeal in the rapidly evolving tech landscape.

Caterpillar’s Lone Bear Says Machinery Maker Is No AI Darling

<a href="https://www.techspot.com/news/110306-microsoft-explains-how-windows-11-become-agentic-os.html" target="_blank"><img src="https://www.techspot.com/images2/news/ts3_thumbs/2025/11/2025-11-18-ts3_thumbs-d01.jpg" width="800" height="560" style="padding: 15px 0" title="Microsoft explains how Windows 11 will become an agentic OS whether you like it or not" /></a> Windows president Pavan Davuluri recently described the future of Windows as an agentic operating system, where AI bots and large language models handle the user's commands on files and computing tasks. Critics mostly greeted the idea with scorn, cursing, and frustration over the "bug-ridden slop pile" the OS currently is.... <a href="https://www.techspot.com/news/110306-microsoft-explains-how-windows-11-become-agentic-os.html">Read Entire Article</a>

وصف رئيس شركة مايكروسوفت، بavan دافولوري، خطط ويندوز 11 للتطور إلى ما يسميه 'نظام تشغيل وكيل'. ستتضمن هذه التحويلة دمج روبوتات الذكاء الاصطناعي ونماذج اللغة الكبيرة لإدارة أوامر المستخدم ومهام الحوسبة. ومع ذلك، تم استقبال هذا الإعلان بسخرية وانتقادات من المستخدمين الذين يشعرون بالإحباط من الحالة الحالية لنظام التشغيل، الذي يصفونه بأنه مليء بالأخطاء.

El presidente de Microsoft, Pavan Davuluri, ha descrito los planes para que Windows 11 evolucione hacia lo que él llama un 'sistema operativo agente'. Esta transformación implicará la integración de bots de IA y modelos de lenguaje para gestionar los comandos y tareas informáticas del usuario. Sin embargo, el anuncio ha sido recibido con escepticismo y críticas por parte de los usuarios, que están frustrados con el estado actual del sistema operativo, que describen como plagado de errores.

Le président de Microsoft, Pavan Davuluri, a décrit les projets de Windows 11 pour évoluer vers ce qu'il appelle un 'système d'exploitation agentique'. Cette transformation impliquera l'intégration de bots IA et de modèles de langage pour gérer les commandes des utilisateurs et les tâches informatiques. Cependant, cette annonce a été accueillie avec scepticisme et critiques de la part des utilisateurs frustrés par l'état actuel du système d'exploitation, qu'ils décrivent comme rempli de bogues.

Microsoft's president, Pavan Davuluri, has outlined plans for Windows 11 to evolve into what he describes as an 'agentic operating system.' This transformation will involve the integration of AI bots and large language models to manage user commands and computing tasks. However, the announcement has been met with skepticism and criticism from users who are frustrated with the current state of the operating system, which they describe as plagued by bugs.

Microsoft explains how Windows 11 will become an agentic OS whether you like it or not

Although Black Friday is still two weeks away, you can find great Nintendo Switch and Switch 2 deals now. I've collected the best from Walmart, Best Buy, and more.

مع اقتراب يوم الجمعة السوداء بعد أسبوعين، تتوفر بالفعل عروض مبكرة على أجهزة نينتندو سويتش وسويتش 2. تقدم متاجر كبيرة مثل وول مارت وبيست باي أكثر من 20 عرضًا، مما يوفر للمستهلكين فرصة لتوفير المال على منتجات الألعاب الشهيرة قبل موسم التسوق للعطلات.

A medida que se acerca el Black Friday en dos semanas, ya están disponibles ofertas anticipadas en las consolas Nintendo Switch y Switch 2. Grandes minoristas como Walmart y Best Buy están ofreciendo más de 20 ventas, brindando a los consumidores la oportunidad de ahorrar en productos de videojuegos populares antes de la locura de compras navideñas.

À l'approche de Black Friday dans deux semaines, des offres anticipées sur les consoles Nintendo Switch et Switch 2 sont déjà disponibles. Des détaillants majeurs comme Walmart et Best Buy proposent plus de 20 ventes, offrant aux consommateurs l'occasion d'économiser sur des produits de jeu populaires avant la ruée des achats de vacances.

As Black Friday approaches in two weeks, early deals on Nintendo Switch and Switch 2 consoles are already available. Major retailers like Walmart and Best Buy are offering over 20 sales, providing consumers with an opportunity to save on popular gaming products ahead of the holiday shopping rush.

Best early Black Friday Nintendo Switch deals 2025: 20+ sales out early

<A HREF="https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9zb3VyY2VzLm5ld3MvcC9kZW1pcy1oYXNzaWJhcy1vbi1nZW1pbmktMy13b3JsZD91dG1fY2FtcGFpZ249ZW1haWwtcG9zdCZyPTFyODVmJnRva2VuPWV5SjFjMlZ5WDJsa0lqb3lPVFE1T0RreExDSndiM04wWDJsa0lqb3hOemt5TlRnNE9UZ3NJbWxoZENJNk1UYzJNelE1TmpVME5pd2laWGh3SWpveE56WTJNRGc0TlRRMkxDSnBjM01pT2lKd2RXSXRNelV5TlRjNE1DSXNJbk4xWWlJNkluQnZjM1F0Y21WaFkzUnBiMjRpZlEucDdlcWFuMFM3WDNXTXQ1OUY4Y2RYZG1tb1VBRGJNMlBGZDM1c3ZZWUc2YyIsInAiOjE3OTI1ODg5OCwicyI6MzUyNTc4MCwiZiI6ZmFsc2UsInUiOjI5NDk4OTEsImlhdCI6MTc2MzQ5NjU0NiwiZXhwIjoyMDc5MDcyNTQ2LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.MstA6dhKs3CLxJgLSEpyVK_D4Oz9SmcjeXKNnqEDzIg?"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251118/i47.jpg"></A>
<A HREF="http://www.techmeme.com/251118/p47#a251118p47" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Alex Heath / <A HREF="https://sources.news/">Sources</A>: 
<A HREF="https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9zb3VyY2VzLm5ld3MvcC9kZW1pcy1oYXNzaWJhcy1vbi1nZW1pbmktMy13b3JsZD91dG1fY2FtcGFpZ249ZW1haWwtcG9zdCZyPTFyODVmJnRva2VuPWV5SjFjMlZ5WDJsa0lqb3lPVFE1T0RreExDSndiM04wWDJsa0lqb3hOemt5TlRnNE9UZ3NJbWxoZENJNk1UYzJNelE1TmpVME5pd2laWGh3SWpveE56WTJNRGc0TlRRMkxDSnBjM01pT2lKd2RXSXRNelV5TlRjNE1DSXNJbk4xWWlJNkluQnZjM1F0Y21WaFkzUnBiMjRpZlEucDdlcWFuMFM3WDNXTXQ1OUY4Y2RYZG1tb1VBRGJNMlBGZDM1c3ZZWUc2YyIsInAiOjE3OTI1ODg5OCwicyI6MzUyNTc4MCwiZiI6ZmFsc2UsInUiOjI5NDk4OTEsImlhdCI6MTc2MzQ5NjU0NiwiZXhwIjoyMDc5MDcyNTQ2LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.MstA6dhKs3CLxJgLSEpyVK_D4Oz9SmcjeXKNnqEDzIg?">Q&amp;A with Demis Hassabis on Gemini 3, spending most of his research time on world models, fitting the entire Google Search index into Gemini, AI bubble, and more</A>&nbsp; &mdash;&nbsp; Demis Hassabis was noticeably relaxed when he joined our virtual call from London.&nbsp; &mdash;&nbsp; It was the day before the release of Gemini 3 &hellip;

ناقش ديميس هاسابيس، المؤسس المشارك لشركة ديب مايند، تقدم نموذج جيميني 3، أحدث نموذج للذكاء الاصطناعي من جوجل، مشددًا على قدراته في نمذجة العالم ودمج كامل فهرس بحث جوجل في النظام. تناول المخاوف بشأن فقاعة الذكاء الاصطناعي وأبرز إمكانيات النموذج في تحسين تفاعلات المستخدمين وتوفير معلومات أكثر دقة. تعكس أفكار هاسابيس التزامًا بدفع حدود تكنولوجيا الذكاء الاصطناعي ودمجها في التطبيقات اليومية.

Demis Hassabis, cofundador de DeepMind, discutió los avances de Gemini 3, el último modelo de IA de Google, enfatizando sus capacidades en modelos del mundo y la integración de todo el índice de búsqueda de Google en el sistema. Abordó las preocupaciones sobre la burbuja de la IA y destacó el potencial del modelo para mejorar las interacciones con los usuarios y proporcionar información más precisa. Las ideas de Hassabis reflejan un compromiso por llevar la tecnología de IA a nuevos límites y su integración en aplicaciones cotidianas.

Demis Hassabis, co-fondateur de DeepMind, a discuté des avancées de Gemini 3, le dernier modèle d'IA de Google, en mettant l'accent sur ses capacités en modélisation du monde et l'intégration de l'ensemble de l'index de recherche de Google dans le système. Il a abordé les préoccupations concernant la bulle de l'IA et a souligné le potentiel du modèle à améliorer les interactions avec les utilisateurs et à fournir des informations plus précises. Les réflexions de Hassabis reflètent un engagement à repousser les limites de la technologie IA et son intégration dans les applications quotidiennes.

Demis Hassabis, co-founder of DeepMind, discussed the advancements of Gemini 3, Google's latest AI model, emphasizing its capabilities in world modeling and fitting the entire Google Search index into the system. He addressed concerns about the AI bubble and highlighted the model's potential to enhance user interactions and provide more accurate information. Hassabis's insights reflect a commitment to pushing the boundaries of AI technology and its integration into everyday applications.

Q&A with Demis Hassabis on Gemini 3, spending most of his research time on world models, fitting the entire Google Search index into Gemini, AI bubble, and more (Alex Heath/Sources)

arXiv:2511.08986v2 Announce Type: replace 
Abstract: Randomized controlled trials (RCTs) are indispensable for establishing the clinical value of medical artificial-intelligence (AI) tools, yet their high cost and long timelines hinder timely validation as new models emerge rapidly. Here, we propose BRIDGE, a data-reuse RCT design for AI-based risk models. AI risk models support a broad range of interventions, including screening, treatment selection, and clinical alerts. BRIDGE trials recycle participant-level data from completed trials of AI models when legacy and updated models make concordant predictions, thereby reducing the enrollment requirement for subsequent trials. We provide a practical checklist for investigators to assess whether reusing data from previous trials allows for valid causal inference and preserves type I error. Using real-world datasets across breast cancer, cardiovascular disease, and sepsis, we demonstrate concordance between successive AI models, with up to 64.8% overlap in top 5% high-risk cohorts. We then simulate a series of breast cancer screening studies, where our design reduced required enrollment by 46.6%--saving over US$2.8 million--while maintaining 80% power. By transforming trials into adaptive, modular studies, our proposed design makes Level I evidence generation feasible for every model iteration, thereby accelerating cost-effective translation of AI into routine care.

تعتبر التجارب السريرية العشوائية (RCTs) ضرورية لتأكيد الفعالية السريرية لأدوات الذكاء الاصطناعي الطبية، لكن تكاليفها العالية والجداول الزمنية الطويلة تشكل تحديات كبيرة. يقدم التصميم المقترح BRIDGE حلاً من خلال إعادة استخدام بيانات المشاركين من التجارب السابقة عندما تنتج نماذج الذكاء الاصطناعي تنبؤات متشابهة. يمكن أن يقلل هذا النهج من متطلبات التسجيل بنسبة 46.6% ويوفر أكثر من 2.8 مليون دولار مع الحفاظ على قوة إحصائية تبلغ 80%، مما يبرز إمكانيته في التحقق الفعال من نماذج الذكاء الاصطناعي في مجالات مثل سرطان الثدي وأمراض القلب والإنتان.

Los ensayos controlados aleatorios (ECA) son esenciales para validar la efectividad clínica de las herramientas de IA médica, pero sus altos costos y largos plazos presentan desafíos significativos. El diseño propuesto BRIDGE ofrece una solución al reutilizar datos de participantes de ensayos anteriores cuando los modelos de IA producen predicciones similares. Este enfoque puede reducir los requisitos de inscripción en un 46.6% y ahorrar más de 2.8 millones de dólares, manteniendo una potencia estadística del 80%, demostrando su potencial para una validación eficiente de modelos de IA en áreas…

Les essais contrôlés randomisés (ECR) sont essentiels pour valider l'efficacité clinique des outils d'IA médicale, mais leurs coûts élevés et leurs délais prolongés posent des défis importants. La conception proposée BRIDGE offre une solution en réutilisant les données des participants d'essais précédents lorsque les modèles d'IA produisent des prédictions similaires. Cette approche peut réduire les besoins d'inscription de 46,6 % et économiser plus de 2,8 millions de dollars tout en maintenant une puissance statistique de 80 %, démontrant son potentiel pour une validation efficace des modèles…

Randomized controlled trials (RCTs) are essential for validating the clinical effectiveness of medical AI tools, but their high costs and lengthy timelines pose significant challenges. The proposed BRIDGE design offers a solution by reusing participant-level data from previous trials when AI models yield similar predictions. This approach can significantly reduce enrollment requirements by 46.6% and save over $2.8 million while maintaining an 80% statistical power, demonstrating its potential for efficient AI model validation in areas like breast cancer, cardiovascular disease, and sepsis.

Data reuse enables cost-efficient randomized trials of medical AI models

arXiv:2511.10809v2 Announce Type: replace 
Abstract: Linear Predictive Clustering (LPC) partitions samples based on shared linear relationships between feature and target variables, with numerous applications including marketing, medicine, and education. Greedy optimization methods, commonly used for LPC, alternate between clustering and linear regression but lack global optimality. While effective for separable clusters, they struggle in non-separable settings where clusters overlap in feature space. In an alternative constrained optimization paradigm, Bertsimas and Shioda (2007) formulated LPC as a Mixed-Integer Program (MIP), ensuring global optimality regardless of separability but suffering from poor scalability. This work builds on the constrained optimization paradigm to introduce two novel approaches that improve the efficiency of global optimization for LPC. By leveraging key theoretical properties of separability, we derive near-optimal approximations with provable error bounds, significantly reducing the MIP formulation's complexity and improving scalability. Additionally, we can further approximate LPC as a Quadratic Pseudo-Boolean Optimization (QPBO) problem, achieving substantial computational improvements in some settings. Comparative analyses on synthetic and real-world datasets demonstrate that our methods consistently achieve near-optimal solutions with substantially lower regression errors than greedy optimization while exhibiting superior scalability over existing MIP formulations.

Near-optimal Linear Predictive Clustering in Non-separable Spaces via Mixed Integer Programming and Quadratic Pseudo-Boolean Reductions

arXiv:2511.07947v2 Announce Type: replace-cross 
Abstract: Machine learning models constitute valuable intellectual property, yet remain vulnerable to model extraction attacks (MEA), where adversaries replicate their functionality through black-box queries. Model watermarking counters MEAs by embedding forensic markers for ownership verification. Current black-box watermarks prioritize MEA survival through representation entanglement, yet inadequately explore resilience against sequential MEAs and removal attacks. Our study reveals that this risk is underestimated because existing removal methods are weakened by entanglement. To address this gap, we propose Watermark Removal attacK (WRK), which circumvents entanglement constraints by exploiting decision boundaries shaped by prevailing sample-level watermark artifacts. WRK effectively reduces watermark success rates by at least 88.79% across existing watermarking benchmarks.
  For robust protection, we propose Class-Feature Watermarks (CFW), which improve resilience by leveraging class-level artifacts. CFW constructs a synthetic class using out-of-domain samples, eliminating vulnerable decision boundaries between original domain samples and their artifact-modified counterparts (watermark samples). CFW concurrently optimizes both MEA transferability and post-MEA stability. Experiments across multiple domains show that CFW consistently outperforms prior methods in resilience, maintaining a watermark success rate of at least 70.15% in extracted models even under the combined MEA and WRK distortion, while preserving the utility of protected models.

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

arXiv:2511.06854v2 Announce Type: replace-cross 
Abstract: Irregularly sampled time series (ISTS), characterized by non-uniform time intervals with natural missingness, are prevalent in real-world applications. Existing approaches for ISTS modeling primarily rely on observed values to impute unobserved ones or infer latent dynamics. However, these methods overlook a critical source of learning signal: the reconstruction error inherently produced during model training. Such error implicitly reflects how well a model captures the underlying data structure and can serve as an informative proxy for unobserved values. To exploit this insight, we propose iTimER, a simple yet effective self-supervised pre-training framework for ISTS representation learning. iTimER models the distribution of reconstruction errors over observed values and generates pseudo-observations for unobserved timestamps through a mixup strategy between sampled errors and the last available observations. This transforms unobserved timestamps into noise-aware training targets, enabling meaningful reconstruction signals. A Wasserstein metric aligns reconstruction error distributions between observed and pseudo-observed regions, while a contrastive learning objective enhances the discriminability of learned representations. Extensive experiments on classification, interpolation, and forecasting tasks demonstrate that iTimER consistently outperforms state-of-the-art methods under the ISTS setting.

تُعتبر السلاسل الزمنية المأخوذة بشكل غير منتظم (ISTS) شائعة في التطبيقات الواقعية، حيث تتميز بفترات زمنية غير متساوية وغيابات طبيعية. تعتمد الأساليب الحالية لنمذجة ISTS بشكل أساسي على القيم المرصودة لاستنتاج القيم غير المرصودة، متجاهلة إشارة التعلم الناتجة عن خطأ إعادة البناء الذي يتم إنتاجه أثناء تدريب النموذج. يقترح الإطار المقترح iTimER استغلال هذا الخطأ في إعادة البناء لتحسين تعلم التمثيل من خلال توليد ملاحظات زائفة للعلامات الزمنية غير المرصودة، مما يحسن نمذجة ISTS.

Las series temporales muestreadas de manera irregular (ISTS) son comunes en aplicaciones del mundo real, caracterizadas por intervalos de tiempo no uniformes y ausencias naturales. Los métodos existentes para la modelización de ISTS suelen depender de valores observados para inferir los no observados, ignorando la señal de aprendizaje proveniente del error de reconstrucción generado durante el entrenamiento del modelo. El marco propuesto iTimER aprovecha este error de reconstrucción para mejorar el aprendizaje de la representación generando pseudo-observaciones para marcas de tiempo no observa…

Les séries temporelles échantillonnées de manière irrégulière (ISTS) sont courantes dans les applications réelles, caractérisées par des intervalles de temps non uniformes et des absences naturelles. Les méthodes traditionnelles de modélisation des ISTS s'appuient souvent sur des valeurs observées pour inférer celles non observées, négligeant le signal d'apprentissage provenant de l'erreur de reconstruction produite lors de l'entraînement du modèle. Le cadre proposé iTimER exploite cette erreur de reconstruction pour améliorer l'apprentissage de la représentation en générant des pseudo-observa…

Irregularly sampled time series (ISTS) are common in real-world applications, characterized by non-uniform time intervals and natural missingness. Traditional ISTS modeling methods often rely on observed values to infer unobserved ones, neglecting the learning signal from reconstruction error produced during model training. The proposed iTimER framework leverages this reconstruction error to enhance representation learning by generating pseudo-observations for unobserved timestamps, thus improving the modeling of ISTS.

On the Role of Calibration in Benchmarking Algorithmic Fairness for Skin Cancer Detection

Was this article worth reading? Share it